Case Study: Content Analytics Firm Requires Speed of Hot Cloud Storage
Kaleidoscope is a startup company that deals with mass quantities of data. Its business is to continually search unstructured data from a variety of sources across the Internet, process billions of words each second, and aggregate, visually index, and store the data as “knowledge” that its clients can immediately access. Unlike traditional search engines, Kaleidoscope organizes and displays its results in an aggregate/visual way. The company’s self-developed search engine scans Securities and Exchange Commission (SEC) filings, news, social media sites, blogs, press releases, and other online content sources from around the world for information on company performance, stressors, mergers, buyouts, and takeovers. The information is then categorized and stored so that Kaleidoscope clients in financial, legal, and other industries needing up-to-the-second information about business climate changes have the information they need to make informed investment decisions.
Kaleidoscope needed to store about 10 terabytes of indexed, visual data each month that could be immediately retrieved by the company’s clients. Data is continually compiled by Kaleidoscope’s powerful, self-developed worldwide search engine, As each document comes in, it is processed at the Nimbix supercomputer center in Dallas, Texas for categorization and indexing. Once indexed, the size of the data can grow by a factor of 20 or more, says Kee Kimbrell, Kaleidoscope co-founder. “We needed to be able to stream all that processed data to a quick-retrieval storage system” with the performance to keep up with the transfers, he explains.
The company initially looked into Amazon Elastic File System (Amazon EFS), which provides elastic file storage for use with AWS Cloud services and on-premises resources. “EFS works great as a shared file system but is expensive,” adds co-founder Raul Peralta. “Plus, we needed an object store system.”
“We’re continually uploading huge volumes of processed data from the Nimbix supercomputing center to Wasabi storage, and Wasabi doesn’t have any problem keeping up. Performance is what made the sale for us. We like Wasabi’s price, but performance was the deciding factor.”
Kaleidoscope settled on a dual-storage system, one to house the raw data for indexing and another for storing the processed visual data that clients can access. Raw data are stored in AWS S3 and processed at the Nimbix supercomputer center where they are run through a pattern analysis tool of several thousand patterns and then categorized and converted to visual form such as charts and graphs for clients. Kaleidoscope stores the processed results in Wasabi hot cloud storage as object files. Because Wasabi hot cloud storage is 100 percent bit-compatible with AWS S3, this process is completely seamless.
“We’re continually uploading huge volumes of processed data from the Nimbix supercomputer to Wasabi storage, and Wasabi doesn’t have any problem keeping up,” says Kimbrell. “Performance is what made the sale for us. We like Wasabi’s price, but performance was the deciding factor.”
Kaleidoscope Web servers running at AWS pull the categorized data through a file caching system from Wasabi for very fast customer access.
Wasabi is currently storing many terabytes of indexed visual data for Kaleidoscope. About 80,000 monthly SEC filings constitute about 15 percent of the Wasabi stored data, estimates Kimbrell. Clients access it through Kaleidoscope servers via a caching system Kaleidoscope created specific to its needs. “We wrote the file system on top of Wasabi and it wasn’t very complicated thanks to their non-proprietary approach to object storage and compatibility with Amazon S3,” he says.
What’s Next for Kaleidoscope?
The company is currently focused on gathering and indexing SEC information that pertains to investment and trading decision making. It is also looking ahead to indexing images, such as Instagram photos that its clients would be able to find with a keyword search. The indexed data is also being used for sentiment analysis, that can be used for gauging the political climate or tracking how a given company’s brand or campaign is registering with the public in specific categories. These efforts would expand Kaleidoscope’s search indexing to retail, ad agency, and other clients, driving requirements for even greater volumes of hot cloud storage that’s instantly accessible and affordable.