by

On Big Data, Patent Law, and Global Warming

Researcher Dov Greenbaum examines the legal protections provided for big data and whether our insatiable internet appetites should be curbed to save the planet

Dov Greenbaum 11:1413.12.19

Big Data has long been a trending buzzword in the tech space. It is typically defined as information that is characterized by what is known as the 4 Vs: velocity, for the speed at which the data streams out into the ether; variety, for the sheer diversity of data types, from text, to video, to image, to whatever; volume, for the deluge of data that is created every second both on and offline; and, veracity, the inability, particularly in a world of fake news and fake reviews, to trust much of the data.

Big data has been widely referred to as the new oil or the new gold, for both the insatiable desire it inspires to collect more of it and for the value that it holds. Just as oil was essential in the first industrial revolution as an important fuel, so too in this fourth industrial revolution, data is what makes the world go around, or at least what pays for our free searches, apps, and email services.

Data center (illustration). Photo: MedOne

Surprisingly, however, for all of its value and worth to modern society, we do not really have an intellectual property regime that properly protects this data, like other types of creative and original endeavors that are important components of our digital economy.

Consider patent law. Patents are intended to provide a limited monopoly to inventors of machines, processes, or compositions of matter. In return for extensive control over the creation and distribution of their invention for a twenty-year period, inventors must provide patent offices with the necessary know-how to recreate the invention described therein.

Notably, the scope of what is patentable has been steadily shrinking since patentable subject matter was famously described in the 1950s as covering “everything under the sun made by man.” After years of whittling away at that broad standard by both the lower courts and the U.S. Supreme Court, patentable subject matter now broadly excludes laws of nature, natural phenomenon, and abstract ideas, including, arguably, many valuable algorithms and isolated strands of disease-causing DNA.

Pure factual data, collected from various sources, seemingly falls under these subject matter limitations, and is as such not protectable by patents.

Copyright law is designed to provide artists, authors and the like with long-term, albeit narrow, protection for their original works of authorship, provided that those works are inscribed in a tangible medium like a USB drive, a piece of paper, or magnetic tape. Like with patents, there is a similar list of works that are excluded from copyright, including facts.

This was not always the case. At the turn of the 20th century, judicial dicta famously noted that “what is worth copying is prima facie worth protecting.” Similarly, U.S. judicial conventional wisdom once held that the "sweat of your brow", or in other words, your efforts, was sufficient to imbue copyrights into even simple compilations of alphabetized data.

However, in a foundational 1991 U.S. Supreme Court decision, a phonebook publisher was deemed to not own the facts and data that they compiled into a phonebook, as those data points lacked the requisite original authorship.

A year later, U.S. courts found other ways to provide some limited protection to data, such as through contracts that users accepted when they purchased compilations of data, but these contracts did not provide ownership. So how does one go about proving that simple factual data, readily available from other sources, was copied, against the terms of a contract? Via salting, the age-old practice of introducing fake words, names, and places into dictionaries, phonebooks and maps.

Contractual protections notwithstanding, in the end, copyright, like all areas of intellectual property, abhors the protection of facts and ideas, as their protection would chill creativity. Consider the sheer number of formulaic buddy cop movies, including Die Hard, a staple of the upcoming holiday season. Although they all differ in how they incorporate various stereotypes or comedic situations, at their core, all buddy cop movies are based on the same idea: two very different individuals are forced to collaborate and cooperate in their efforts to fight crime. Yet, each film exists as its own copyrighted work within the library of films. Why? Because only the expression of the idea is protected, not the underlying idea and facts.

The third alternative, within the regimes of intellectual property, is trade secret. Trade secret value is limited if you intend to share and trade data, and only provides protection for data up until the secrets are divulged, by accident or by malicious actors. For academics and professionals in the health sciences where trading data is integral to research and patient care, trade secret does not provide a viable solution to promote investment in vast research and health databases.

One limited option does exist within the U.S. to protect some datasets: the Digital Millennium Copyright Act (DMCA), signed into law by then-President Bill Clinton in 1998, has a provision that makes it illegal to circumvent digital rights management tools to access copyrighted works. Thinly protected databases can hide behind this provision, but their rights are only infringed if and when they are hacked.

While the U.S. has been limiting protections for databases, in Europe there was an effort to deal with the growing importance of databases within the information society, and the shrinking ability to gain protection through new forms of intellectual property. The 1996 European Database Directive was designed to protect original databases that required intensive efforts or investments to create.

However, as the courts have interpreted this sui generis right, today it is mostly limited to the database structure itself, and again, not to the underlying, thinly protected factual data. Academics have been particularly unhappy with this law, which they see as limiting the free transfer of data (free as in free speech, not as in free beer).

At its core, intellectual property is effectively a bundle of rights granted to creators. Perhaps the most comprehensive protection for data is provided by the European Union's General Data Protection Regulations (GDPR), wherein individuals are granted some rights concerning their data, for example, the right to know if it has been stolen or the right to transfer it to another service provider. Arguably, the bundle of privacy rights granted to individuals under the GDPR is a form of intellectual property. However, in most jurisdictions, the protections granted by the GDPR provide only liabilities for the data compilers.

Although we continue to hit our collective heads against the wall of intellectual property, perhaps the focus on per se intellectual property rights is misguided. Perhaps we should reframe the control granted over data, not as property-like ownership, but rather a custodial role. All classical forms of intellectual property are a quid pro quo, a give and take. In return for providing a benefit to society, the intellectual property owner is given some monopoly rights. Perhaps this is also the case when it comes to data. As custodians of data, large data compilers are granted some control over information. They can, for example, sell or license it via contracts. In return, they provide good stewardship and are required to protect said data and not abuse it, as defined by various laws and regulations.