the bucket

Which Cloud Storage is Best for Your Data Lake?

Which Cloud Storage is Best for Your Data Lake?

David Friend
By David Friend
President, CEO & Co-founder

March 28, 2018

Is Cloud the de facto Data Lake? That's what Tony Baer, lead big data analyst at Ovum thinks. In fact, that's what many in the industry believe. After all, a data lake is, by definition, a massively scalable, easily accessible, centralized repository of mostly unstructured data. Sounds like cloud object storage to me. However, not all cloud storage is created equal. So the question I would ask is “Which cloud storage provider is best for my big data lake?”

Before we dive into that question, let's briefly discuss what data lakes are and why we need them.

It's a data-first world

We live in a world where data is no longer just stored and forgotten. It's about extracting value from data. We need the ability to analyze data and visualize the results–and by “we” I mean all of us, not just business analysts and data scientists. Everyone from marketers optimizing their ad spending to doctors using predictive analytics to schedule potentially life-saving screenings for at-risk patients need access to raw data–and lots of it.

The data lake was designed to solve this problem. Unlike a data warehouse, which is a highly structured subset of data deemed worthy of analysis, the data lake takes a store-everything approach to big data. Since we can’t really know in advance what data will ultimately be valuable to a particular group, all raw data is stored without regard to structure. Data is classified, organized or analyzed only when accessed. For this reason, data lakes must rely on very inexpensive classes of storage. And as prices for cloud storage, have come down in recent years, this is why Tony predicts that cloud storage will be the defacto data lake.

So, which cloud storage is the best choice for your data lake?

According to Amazon's website, a data lake should provide inexpensive, durable, safe, and scalable storage. Here's how Wasabi stacks up to meet those requirements:

1. A data lake must be inexpensive

Big data insights require a lot of data, so costs can add up fast. As I've written before, storing a massive amount of data in first-generation cloud storage costs roughly the same as on-premises storage. Cloud storage costs are also notoriously unpredictable and difficult to calculate due to all the extraneous extra fees for egress and API calls.

Wasabi costs 80% less than Amazon S3, has unlimited free egress and doesn't charge for API calls. It is by far the lowest cost cloud object storage in the industry.

2. A data lake must be durable

Wasabi checks the box here, too. We have the same 11 nines of durability as the major top-tier cloud providers, plus we actively check file integrity every 90 days. So the likelihood of losing data due to hardware malfunction or corrupted files is remote. However, the possibility of losing data to human error, buggy applications, hackers and malware is very real. That's why Wasabi offers the cloud storage industry's only complete data immutability feature.

3. A data lake must be secure

Cloud storage providers are better equipped to deal with today's constantly evolving threat landscape than even the most sophisticated enterprise IT departments. Without strong security expertise and best practices, we'd have no business. Wasabi's certified, redundant data centers follow all industry best practices for physical and data security. Data is encrypted at rest, and our unique data immutability feature prevents accidental or malicious deletion or alteration of your data.

4. A data lake must be scalable

Cloud storage is inherently scalable. Wasabi was designed with an exabyte-scale architecture.

I would add a few more requirements to Amazon's list:

 

5. Your data lake should be fast

Wasabi's performance is 6x faster than AWS S3 on average. This speed difference also translates to a massive improvement in time to first byte (TTFB). You can request a copy of our Performance Benchmark Report, which includes instructions and test scripts to perform your own Wasabi vs. S3 performance test.

6. Your data lake should be simple

Thanks to the complexity of first-generation cloud storage, data lake platform providers have been forced to build all sorts of additional functionality to help users move data from expensive hot storage to cheaper cold storage tiers. Wasabi's insanely fast performance and (frankly) ridiculously low prices enable us to do away with all those complicated storage tiers and pricing models.

Wasabi has only one tier: blazing fast hot storage at cold storage prices. There are no additional fees for egress or API calls, so customers aren't penalized for accessing their own data. And their cloud storage bill is 100% predictable.

7. Your data lake should free you from vendor lock-in

We designed Wasabi to be 100% compatible with AWS S3, and have a growing list of more than 100 fully tested partner applications. If you use data analytics tools that use S3, or run your analytics from AWS EC2, you can store your data lake in Wasabi with no changes to your applications. We also support AWS Direct Connect, access via public Internet (with no egress fees!), as well as our own Wasabi Direct Connect solution.

8. Your data lake must enable analytics

The entire reason for building a data lake is to be able to store ALL of your data so it is there when you are ready to combine, manipulate, and analyze it. If the high price of storage forces you to decide in advance what data may be valuable to you or someone else in the future–and you are penalized every time you access your data–that's probably not the right cloud storage for your data lake.

We built Wasabi to solve the world's data storage problem. We’re ready to help solve yours.

the bucket
David Friend
By David Friend
President, CEO & Co-founder