In Big Data: A Revolution That Will Transform How We Live, Work, and Think, Viktor Mayer-Schönberger and Kenneth Cukier consider the benefits and threats of a big data world. In the excerpt below, the authors outline the privacy risks of big data, revealing how the value of information no longer resides solely in its primary purpose, and analysing the implications of a world in which more data is being collected and stored about each one of us than ever before.
For almost forty years, until the Berlin wall came down in 1989, the east German state security agency known as the Stasi spied on millions of people. Employing around a hundred thousand full-time staff, the Stasi watched from cars and streets. It opened letters and peeked into bank accounts, bugged apartments and wire-tapped phone lines. And it induced lovers and couples, parents and children, to spy on each other, betraying the most basic trust humans have in each other. The resulting files — including at least 39 million index cards and 70 miles of documents — recorded and detailed the most intimate aspects of the lives of ordinary people. East Germany was one of the most comprehensive surveillance states ever seen.
Twenty years after East Germany’s demise, more data is being collected and stored about each one of us than ever before. We’re under constant surveillance: when we use our credit cards to pay, our cellphones to communicate, or our Social Security numbers to identify ourselves. In 2007 the British media relished the irony that there were more than 30 surveillance cameras within 200 yards of the London apartment where George Orwell wrote 1984. Well before the advent of the Internet, specialized companies like Equifax, Experian, and Acxiom collected, tabulated, and provided access to personal in- formation for hundreds of millions of people worldwide. The Internet has made tracking easier, cheaper, and more useful. And clandestine three-letter government agencies are not the only ones spying on us. Amazon monitors our shopping preferences and Google our browsing habits, while Twitter knows what’s on our minds. Facebook seems to catch all that information too, along with our social relationships. Mobile operators know not only whom we talk to, but who is nearby.
[ms-protect-content id=”9932″]With big data promising valuable insights to those who analyze it, all signs seem to point to a further surge in others’ gathering, storing, and reusing our personal data. The size and scale of data collections will increase by leaps and bounds as storage costs continue to plummet and analytic tools become ever more powerful.
The stakes are higher than is typically acknowledged. The dangers of failing to govern big data in respect to privacy go far beyond trifles like targeted online ads. The history of the twentieth century is blood-soaked with situations in which data abetted ugly ends. In 1943 the U.S. Census Bureau handed over block addresses (but not street names and numbers, to maintain the fiction of protecting privacy) of Japanese-Americans to facilitate their internment. The Netherlands’ famously comprehensive civil records were used by the invading Nazis to round up Jews. The five-digit numbers tattooed into the forearms of Nazi concentration-camp prisoners initially corresponded to IBM Hollerith punch-card numbers; data processing facilitated murder on an industrial scale.
The important question, however, is not whether big data increases the risk to privacy (it does), but whether it changes the character of the risk. If the threat is simply larger, then the laws and rules that protect privacy may still work in the big-data age; all we need to do is redouble our existing efforts. On the other hand, if the problem changes, we may need new solutions.
Unfortunately, the problem has been transformed. With big data, the value of information no longer resides solely in its primary purpose. It is now in secondary uses.
This change undermines the central role assigned to individuals in current privacy laws. Today they are told at the time of collection which information is being gathered and for what purpose; then they have an opportunity to agree, so that collection can commence. While this concept of “notice and consent” is not the only lawful way to gather and process personal data, according to Fred Cate, a privacy expert at Indiana University, it has been transmogrified into a cornerstone of privacy principles around the world. (In practice, it has led to super-sized privacy notices that are rarely read, let alone understood — but that is another story.)
Strikingly, in a big-data age, most innovative secondary uses haven’t been imagined when the data is first collected. How can companies provide notice for a purpose that has yet to exist? How can individuals give informed consent to an unknown? Yet in the absence of consent, any big-data analysis containing personal information might require going back to every person and asking permission for each re- use. No company could shoulder the cost, even if the task were technically feasible.
The alternative, asking users to agree to any possible future use of their data at the time of collection, isn’t helpful either. Such a wholesale permission emasculates the very notion of informed consent. In the context of big data, the tried and trusted concept of notice and consent is often either too restrictive to unearth data’s latent value or too empty to protect individuals’ privacy.
Other ways of protecting privacy fail as well. If everyone’s information is in a dataset, even choosing to “opt out” may leave a trace. Take Google’s Street View. Its cars collected images of roads and houses in many countries. In Germany, Google faced widespread public and media protests. People feared that pictures of their homes and gardens could aid gangs of burglars in selecting lucrative targets. Under regulatory pressure, Google agreed to let homeowners opt out by blurring their houses in the image. But the opt-out is visible on Street View — you notice the obfuscated houses — and burglars may interpret this as a signal that they are especially good targets.
A technical approach to protecting privacy — anonymization — also doesn’t work effectively in many cases. Anonymization refers to stripping out from datasets any personal identifiers, such as name, address, credit card number, date of birth, or Social Security number. The resulting data can then be analyzed and shared without compromising anyone’s privacy. That works in a world of small data. But big data, with its increase in the quantity and variety of information, facilitates re-identification.
So what is to be done? We envision a very different privacy framework for the big-data age, one focused less on individual consent at the time of collection and more on holding data users accountable for what they do. In such a world, firms will formally assess a particular reuse of data based on the impact it has on individuals whose personal information is being processed. This does not have to be onerously detailed in all cases, as future privacy laws will define broad categories of uses, including ones that are permissible without or with only limited, standardized safeguards. For riskier initiatives, regulators will establish ground rules for how data users should assess the dangers of a planned use and determine what best avoids or mitigates potential harm. This spurs creative reuses of the data, while at the same time it ensures that sufficient measures are taken to see that individuals are not hurt.
Running a formal big-data use assessment correctly and implementing its findings accurately offers tangible benefits to data users: they will be free to pursue secondary uses of personal data in many instances without having to go back to individuals to get their explicit consent. On the other hand, sloppy assessments or poor implementation of safeguards will expose data users to legal liability, and regulatory actions such as mandates, fines, and perhaps even criminal prosecution. Data-user accountability only works when it has teeth.
Shifting the burden of responsibility from the public to the users of data makes sense for a number of reasons. They know much more than anybody else, and certainly more than consumers or regulators, about how they intend to use the data. By conducting the assessment themselves (or hiring experts to do it) they will avoid the problem of revealing confidential business strategies to outsiders. Perhaps most important, the data users reap most of the benefits of secondary use, so it’s only fair to hold them accountable for their actions and place the burden for this review on them.
Despite its informational prowess, there was much that the Stasi could not do. It could not know where everyone moved at all times or whom they talked to without great effort. Today, though, much of this information is collected by mobile phone carriers. The East German state could not predict which people would become dissidents, nor can we — but police forces are starting to use algorithmic models to decide where and when to patrol, which gives a hint of things to come. It makes the risks inherent in big data as large as the datasets themselves.
EXCERPT FROM: Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schönberger and Kenneth Cukier (Houghton Mifflin Harcourt, 2013)
About the Authors
Viktor Mayer-Schönberger is a professor of Internet governance and regulation at the Oxford Internet Institute in the UK. Kenneth Cukier is the data editor of The Economist. They are the authors of “Big Data: A Revolution That Will Transform How We Live, Work, and Think” (Houghton Mifflin Harcourt, 2013).
[/ms-protect-content]