What I Learned at Internet2’s 2019 Global Summit
Earlier this week, I spent some time at Internet2’s 2019 Global Summit in Washington, DC. For those of you who are not familiar with Internet2, it is essentially an alternate Internet that was founded by a consortium of higher education institutions in 1996. Today it connects 317 US universities, 60 government agencies, and more than 100,000 institutions in 100 countries. The idea is to have an alternative to the public Internet where there’s very high bandwidth and connected services that are relevant to the member organizations.
We were honored to join the Internet2 Cloud Exchange in 2018 as the only pure-play cloud storage vendor, and since then have been ramping up Internet2 members to take advantage of a radically reduced price point (80% less expensive than AWS S3), and the combination of high performance connectivity and storage.
For 2019 – This year’s conference was mostly about security, identity, and networking issues. At its core, Internet2 is a network, so it’s not surprising that the folks at this conference are focused on the connectivity backbone, particularly how to enhance it, and how to protect it.
To me, though, it felt like the elephant in the room was actually storage, rather than connectivity and security. Everybody was talking about the volume of data their researchers were generating. This was especially true of the medical research community where imaging and genomics are just going berserk.
Let’s look at genetic data, for example. One speaker said that it won’t be long before every baby has their gene sequenced before they leave the hospital. Why? Your genes can guide doctors on what kind of medications (and dosage) you’ll use throughout your life in a more predictive rather than reactive way. This makes it possible to drive down future costs and improve wellness, both of which are major initiatives of healthcare organizations and government agencies alike.
The machines to sequence a human genome are coming down in price to the point where this is looking entirely practical. The problem is the data, and although we’re used to seeing massive amount of data at Wasabi, frankly I was surprised at the magnitude of the challenge.
With genomic sequencing, you have to do multiple scans to get high quality, reliable data. The “1000 Genomes Project” (begun in 2012) consists of >200 TB for 1700 participants, or 118 GB per individual.
A sequencing machine such as those made by Illumina, will do 30 scans, producing 90 billion pairs and data of roughly 200 GB. Next-gen scanners do 100 scans per genome. As the resolution and completeness of these scans continue to grow, these file sizes will increase correspondingly.
What’s the scale of storing genomic data at a country-wide level?
Let’s use the more recent data size of 200 GB per person for our calculations – it’s a reasonable average between the data provided from genome scanners circa 2012, and what’s possible with next-generation genome scanners becoming available now.
There are 4 million babies born in the United States every year. If every baby received a genomic scan before leaving the hospital, that would result in 800 Petabytes (PB) of data (200 GB x 4,000,000 babies).
800 PB stored with S3 would cost over $200 million per year of new scans.
In a five year period, that would grow to over $1 Billion in ongoing storage costs as the data added up over time.
Clearly Wasabi could drive those costs down tremendously at 80% less than S3 – a $750 million or more PER YEAR savings by year 5! (See our pricing calculator to compare Wasabi vs. other cloud storage providers)
I’m used to thinking big, but that is a daunting number.
All of this potential data, in just a single portion of one slice of one industry…
And yet while almost everybody at the Internet2 Global Summit was talking about moving data, almost nobody was talking about storing it. But as you can see, storage is the big concern of the researchers themselves who somehow have to find a way to keep the data collected for their experiments.
In my opinion, Internet2 members are going to have to address the issue of data storage at some point, and soon. The revolution in the ability to create data needs to be met with the revolution in storing and using it to advance scientific research.
Just as the movement of data across their network is a fast, secure, commodity that any member can use, if there’s no place to store that data at the other end, then what good is it?