Kaleidoscope is a leading company that continually processes and stores mass quantities of unstructured data. Its business transforms unstructured data from worldwide sources across the Internet, processes billions of words each second, aggregates, visually indexes, stores and displays the now actionable data as “knowledge” that its clients can immediately utilize. Unlike traditional search engines, Kaleidoscope organizes and displays its results in a variety of intuitive interactive charts in addition to full text searchable documents. The company’s proprietary next generation search engine scans Securities and Exchange Commission (SEC) filings, news, social media sites, blogs, press releases, and other online content sources worldwide for information on company financial performance, stressors, mergers, buyouts, and takeovers. The information is then categorized and stored so that Kaleidoscope clients in financial, legal, and other industries needing a real-time 360-degree view on corporate financial data, public opinion and industry news they need to make truly informed business decisions.
Each month Kaleidoscope aggregates, indexes and stores many terabytes of text and visual data that must be immediately accessible by the company’s clients. Data is continually compiled by Kaleidoscope’s powerful, next generation search engine. As each document is captured, it is processed at the Nimbix supercomputer center in Dallas, Texas for categorization and indexing. “Once indexed, the size of the data can grow by a factor of 20 or more,” says Kee Kimbrell, Kaleidoscope co-founder. “We must be able to stream all that processed data to a quick-retrieval storage system with high-speed performance to support with high-volume data transfers,” he explains.
The company initially looked into Amazon Elastic File System (Amazon EFS), which provides elastic file storage for use with AWS Cloud services and on-premises resources. “EFS works great as a shared file system but for our high-demand object store system requirements, EFS is cost prohibitive,” adds co-founder Raul Peralta.
Kaleidoscope combines the best of both worlds in a dual-storage system; Nimbix to index the raw data and Wasabi for storing the processed HD visual data that clients can access in real-time. Raw data is stored in AWS S3 and processed at the Nimbix supercomputer center where data is fed through a proprietary pattern analysis tool consisting of an ever-growing collection of several thousand patterns, then categorized and converted into intuitive visual forms such as charts and graphs. The visualization process provides important instant clues and filters for clients and streamlines research through millions of documents. Kaleidoscope stores the processed results in Wasabi’s “hot cloud” storage as object files. Because Wasabi’s “hot cloud” storage is 100 percent bit-compatible with AWS S3, this process is completely seamless.
“We constantly upload huge volumes of processed data from the Nimbix supercomputer to Wasabi storage, and Wasabi’s performance continues to deliver above our expectations,” says Kimbrell. “Although we like Wasabi’s affordability, performance is critical to success. Wasabi’s high-speed performance was the deciding factor.”
Wasabi currently stores the indexed text and visual data for Kaleidoscope. “About 80,000 SEC filings per month constitute about 15 percent of the data, stored by Wasabi,” estimates Kimbrell. “Our clients access these different types of data through Kaleidoscope servers via a proprietary caching system Kaleidoscope created specific to our client’s real-time expectations. We created our file system to integrate with Wasabi and it wasn’t very complicated thanks to their non-proprietary approach to object storage and compatibility with Amazon S3,” he says.
What’s Next for Kaleidoscope?
Currently Kaleidoscope focuses on gathering and indexing SEC information, which pertains to legal, financial and investment/trading decision making, however future plans include indexing images, such as Instagram photos to enrich client research results from a simple keyword search. Additionally, indexed data is critical to generating sentiment analysis, which gauges the positive or negative slant in a variety of meaningful documents such as transcripts of public company conference calls versus analyst’s views, news coverage, or blogs. Sentiment analysis delivers critical insights regarding grassroots opinions to corporate messaging, current affairs, federal policy affects on companies, company brands or whether an ad campaign is registering with the public. These efforts would expand Kaleidoscope’s appeal to retail companies, ad agencies, and other clients, while driving requirements for even greater demand for Wasabi’s “hot cloud” storage that’s instantly accessible for clients and affordable for Kaleidoscope.