Impede the motion of data—and you impede innovation
Why that is, and what to do about it
A stubborn misconception regarding data casts it as mostly static. Picture streams of data arriving at data lakes only to lethargically drift to rest at the bottom—inactive and motionless. Consider the very phrase data lake: it implies a kind of placidity. In particular, when organizations treat data lakes as data dump sites, they become what’s been dubbed as data swamps.
Of course, placid data lakes are a reality in some instances when data does need to be merely stored, and not much else. Archival and backup data belongs to this category—for example, data backed up for business continuity reasons, of which organizations need multiple copies.
At the same time, we live in a world where increasingly enterprises want their data awake and in motion. In The Book of Why: The New Science of Cause and Effect, the Turing Award-winning computer scientist and philosopher Judea Pearl reassured, “You are smarter than your data. Data do not understand causes and effects; humans do.” It’s up to us, humans—and the processes we develop—to make sense of data. It’s up to us to put data to use.
Every business is a data business. But enterprise data is of little value if it is not used. To efficiently and smartly make sense of data, we need to see data lakes as reservoirs where many vibrant rivers meet; the task is to comingle various data currents. There is a need to share data with other lakes, in order to cross-reference and run analytics on disparate streams of data together.
Take autonomous cars. To begin with, there’s value in analyzing data from one vehicle, and within one company. Cross-analyzing that one vehicle’s data with vehicles from all autonomous car companies adds another layer of insight. For a richer picture, zoom out from there to integrating knowledge derived from that one vehicle’s data with data that proceeds from the billions of sensors that make up a smart city. The fuller picture may be useful to the regional government and city planners who implement better public safety standards and traffic flows.
The more pieces you put together, the bigger a puzzle you can solve. You can tackle a much higher-order problem if you share data, cross-referencing various streams of information for analysis.
That's why enabling the movement of data matters. Data needs to move in order to allow for interconnectedness of data—and the insights that result.
The Data Dams
But, as many businesses are finding out, putting large volumes of data into motion can be tricky.
First, egress charges stand in the way. It’s not easy to move data out from public cloud for analysis because of the fees that cloud service providers charge their customers. What would it take to take a petabyte out of the cloud? The egress charge is between 5 and 20 cents per GB every time customers move their data from the cloud to an on-premises location. This means that if an enterprise wants to take out a petabyte of data, it costs between $50,000 and $200,000.
Second, solutions that do solve the data transport problem—such as fiber-optic cable and existing data transport devices—are limited. They aren’t universally available, they may not be big enough, they aren’t flexible enough, or they face ingest problems. “Applying simple math should make it immediately clear there is simply not enough fiber in the ground to handle the growth of the wireline plus wireless internet,” said Cole Crawford, the CEO and founder of Vapor IO. “We're soon going to run out of the type of connectivity we need.” Shuttles can in many cases move large volumes of data fast. But today’s shuttle boxes come with restrictions on logical interfaces; some lack the ruggedness needed for transport. Because many shuttle systems are proprietary, their use cases can be limited.
These issues are all solvable, and business owners who like their data in motion focus on overcoming these barriers. This is all the more important in our multicloud world. If data is not moving—from edge to cloud, from public cloud to on-premises data centers, from cloud to cloud, etc.—it’s not enabling competitive business value.
Innovation, which is often enabled by specialized AI clouds, needs unobstructed flows of data. Winning enterprises know that when they free up the movement of data, they speed up innovation.
Israel is a hub of datasphere innovation, with rising number of startups operating in the local industry. That is also why a global giant like Seagate launched Lyve™ Labs Israel as a collaborative platform through which Seagate partners with Israeli innovators, startups, and enterprises to help them and learn from them. One of the ways the innovation center is helping the ecosystem is through the 'Lyve Innovator of the Year' award program, a newly established annual event to reward innovative startups. Companies that create solutions that harness the flow of data signed up for the program. The final event where the winner will be announced is on September 16th.
Ravi Naik is Seagate Technology’s Senior Vice President and CIO