TECHNOLOGY

Researchers Discover that Wasabi Eliminates Storage Barriers

January 15, 2019By David Friend

Whether exploring the origin of the universe, curing diseases or investigating climate change, today’s researchers are awash in data. Consider the following mind-boggling stats:

The CERN large hadron particle collider in Geneva generates one petabyte of raw collision data per second. You read that correctly – per second. That’s about 12.8 million DVDs worth of data every minute.
The amount of genomics data produced worldwide doubles every seven months. Within the next decade annual genomics data generation will exceed 2 exabytes—enough to fill up roughly 8 million iPhones.
The Large Synoptic Survey Telescope in Chile generates over 15 terabytes of imaging and scientific data every night. The 10-year survey will ultimately produce a 500-petabyte data library. That’s the equivalent of a 6,650-year continuous HD video recording.
NOAA’s National Centers for Environmental Information maintains a treasure trove of oceanic, atmospheric and geophysical data. Last year alone, users accessed more than 9 petabytes of environmental data from NCEI. That’s like 180 million four-drawer filing cabinets filled with text.

Big data = big storage challenges

For all of the news that we’ve heard in the last decade about Big Data and the mammoth datasets that are used, unfortunately, our collective ability to create data far exceeds the budget to keep it all. Many researchers can’t afford to keep anywhere close to 100% of their datasets using traditional on-premises storage platforms or first-generation cloud storage services like Amazon S3, Azure Blob Storage or Google Cloud Storage.

Take CERN for example. They use a combination of on-premises disk and tape-based storage and archiving systems to retain particle collider data. Their data center storage capacity is “only” 135 PB. As a result, CERN ends up discarding the vast majority of its data, ultimately storing only about one petabyte of data per day (about one second’s worth of collision data) for analysis.

The reliance on tape both for long-term archive and more active data usage can easily turn into large, complicated projects that divert personnel, time and budget from the primary research projects themselves.

In the case of CERN, their permanent data archive tape migration project, initiated in 2014, stretched from 2014 through 2015. The end goal of more available tape storage capacity within the same physical footprint was achieved (ultimately freeing up 30,000 tape-cartridge slots as they migrated to higher density/newer generation tapes), but it’s only a matter of time before the labor-intensive and time-consuming process of migrating to the next generation of tape happens all over again.

How does cloud storage compare?

For first-generation cloud providers (AWS, Azure, Google), many researchers have found that the economic benefits of moving from on-premises storage to cloud storage were not what they expected, in particular due to the hidden costs of data egress and API call fees beyond the “pure” cost of the amount of storage itself.

These costs become a much larger multiplier of the total cost of ownership of cloud storage when the volume of the data is as large as scientific research projects often are.

In an effort to reduce egress fees, researchers may focus on a single cloud provider for all of their needs, effectively locking themselves in to a single provider at a higher overall cost and reduced set of capabilities compared to the options they have available by choosing best-of-breed solutions from multiple cloud providers.

From an operational perspective, with cloud storage there is no need for your organization to migrate to another generation of hardware as old hardware ages. That job is taken care of by the cloud storage provider, freeing up time, money and resources to focus on research rather than infrastructure.

Wasabi to the rescue – no more trade-offs

At Wasabi, we believe researchers shouldn’t be held back by expensive and complicated storage solutions. The ability to store data of any size should be the least of your concerns.

Your energy, budget, personnel and time should be focused on delivering insights from data, not from building, owning and operating data infrastructure.

Focus on what data analytics to use, whether Artificial Intelligence/Machine Learning techniques or more traditional approaches. The cutting edge of research is happening where insights are produced, not in the bowels of the infrastructure you use to store the data.

Whether you are sequencing DNA, modeling weather systems or mapping the cosmos, Wasabi can help you slash storage costs, gather more data and accelerate the pace of discovery. Wasabi is great for researchers with cash-strapped grants, for government agencies with tight budgets or for any research organization looking to do more with less. We can help you save money, expand datasets and improve results.

Our mission is to make cloud storage a simple and affordable utility like electricity.

Wasabi Hot Cloud Storage is ultra-cheap, incredibly fast and extremely reliable storage for any purpose. This combination makes Wasabi especially well suited for scientific research, as the scalability of cloud-based object storage is unrivaled both in price and performance, particularly compared to first generation cloud storage or on-premises “enterprise-grade” storage.

The trade-offs that you’ve had to make with on-prem or first generation cloud storage offerings no longer apply.

With a single tier, Wasabi is built to handle the majority of data storage scenarios for scientific research:

Active Data – Live data that is readily accessible by the operating system,
an application or users. This is the area traditionally served by on-prem disks, and is where data lives as you are actively performing analysis on it;
Inactive Archive – Infrequently accessed data, such as permanent data retention needs. Historically, inactive data is archived to tape and stored offsite.

Faster with Wasabi on the Internet2 cloud exchange

Internet2 members can connect directly to Wasabi via the Internet2 Cloud Exchange. Our peering arrangement makes it easy for Internet2 community members to share data and collaborate. Distributed teams can use Wasabi as a common storage repository for joint research programs or other collaborative efforts. Remote team members can exchange large datasets—quickly and securely—directly over the Internet2 network, avoiding public internet latency and performance bottlenecks.

And with no egress charges to use the data stored on Wasabi, or charges for API calls (GET, PUT, DELETE, etc.), your costs are both more predictable, and far less expensive than with first-generation cloud storage providers.

Next steps

If you have not yet moved to the cloud for your data storage needs, often the most logical first step is to bridge your data storage, from on-premises tape to cloud storage. Your options to do so are extensive, as this is a well worn path. Moving tape-based archives to the cloud is among the most straightforward data storage projects you can pursue.

Using Wasabi initially as a “second copy” (backup to your archive) serves as a proving ground for a cloud storage strategy, both providing the safety of a low price backup, and familiarity with high-performance cloud storage without the financial penalties of using your data, should you need to retrieve data from your cloud archive storage.

If you are anticipating a hardware refresh cycle (replacing tape libraries for example), are considering expanding the size of your existing data center, or considering building a new data center, now is the time to consider moving your data to Wasabi. Our next-generation approach makes the price radically more affordable, and allows to you side-step the traditional decisions of which tiers of storage are needed to support your use cases.

To learn more about how Wasabi can help you eliminate Big Data storage barriers, visit our Scientific Research page or contact us today.