I have taken quite a long break from blogging, but hope to begin again with renewed energy in the coming months. As the first foray back into writing, and the first blog post of 2011, I’m going to share the thoughts I sketched out in preparation for a panel at Hyperpublic at the Berkman Center last Friday.
Before I do, a quick comment. In these notes I identify a quote from Jonathan Franzen on the interplay between public and private and shame. It was a quote I found quite insightful and thought-provoking, but unfortunately during the panel at Hyperpublic the audience seemed to think I had asserted that privacy was only necessary for things that provoked shame. I certainly did not intend for that message to come across, clearly there are many reasons for an individual to seek privacy. I simply found it an interesting lens through which to examine the issue, particularly in the contrast I was painting between “hyperpublic” and “data-driven.” I recommend to anyone interested in the topic of privacy and/or publicity to take a read through Franzen’s 1998 essay, the Imperial Bedroom—it is quite thought provoking, and quite a different approach than that we so often see in today’s discourse.
With that, my thoughts for the panel, “The Risks and Beauty of a Hyperpublic Life.”
When they first asked me to join this panel I read over the word “risk” in the title and only saw “beauty of the hyper-public life.” My immediate reaction was—I have nothing to say about that, I reject the notion. Hyper-public life to me implies a Paris Hilton like existence, and while some of you may find that appealing I personally don’t. Even most mega-celebrities seek privacy in their lives, I don’t predict that will change.
But when reading the actual description of the panel, it struck me that the title doesn’t accurately represent what we’re theoretically talking about here. When we talk about unprecedented information gathered—we’re talking about a data driven life. That to me is a beautiful thing, albeit one that comes with risks. My claim is that the risk of a data-driven life is that it become a hyper-public one, but if managed correctly I believe that risk can be mitigated so that only benefits accrue.
There is a tremendous amount of data being collected about people, their behavior and the world around them. This data may represent clicks on webpages, GPS coordinates, purchases, communications between people. And from all that data, we can learn a tremendous amount about the world around us, things that will help us make a better world for the next generation.
The risk that many of us concerned about privacy have a natural tendency to focus on is “what if all that data is tied to me and used to create a dossier, which could be used to exercise power over me in unjust ways?” Until recently, this risk was hard to conceptualize because it was quite theoretical. Well, sure your credit card company knows something about your purchases, and sure a phone company or ISP knows about your communications, and yeah I suppose the websites you visit know what content you like to consume—but it’s all tied to different types of identifiers, held by different companies, it seemed impractical to aggregate it all together.
Meanwhile, analysis of that data provided unprecedented value, and not all of it (much of it?) easily quantifiable. The improved value in something as straightforward as Search—that Google can get you to a better result today than a year ago, and probably faster too—is the result of data analysis. Advertisers have enjoyed more efficiency in dollars spent on marketing—and consumers have enjoyed access to more free content than ever before as a result, to the tune of $100 billion in surplus according to McKinsey & Company and IAB. But those are just the easy gains, the low hanging fruit. What comes next?
Google has used search log data to advance the idea of “predicting the present”—learning something useful about macro-social behavior based on analysis of aggregate query data. This idea was behind the now-familiar Flu Trends and the more recent launch of Dengue Trends, two tools that barely scratch the surface of advances we might make in public health monitoring with the use of predictive analytics. Other sectors that stand to be transformed by big data include energy—where analysis of consumption patterns can enable service operators to manage their networks more efficiently, or end consumers to manage their consumption more efficiently. One of the most exciting areas of this type of analysis is language, where automated real-time translation and transliteration are rapidly becoming commonplace.
We often don’t notice these improvements made possible by predictive analytics because it’s not always clear how marginal data analysis led to incremental or step-change improvements. This is and will continue to be a struggle for companies in the data sector—being transparent not just about data practices, but about the consequences of data practices for the user’s experience. One of the best examples of doing this well today comes from the recommendation engines—sites that make a business out of predicting what content you’ll like best. These sites have perfected subtle design features that indicate, hey we think you’ll like this movie because you rated this other one over here highly, if we got it wrong help us correct it by telling us what you’d like better. Those subtle design features in no way convey the complexity of the predictive analysis happening behind the scenes, but they do convey the idea that data analysis of my past behavior is enabling this end user benefit.
It’s notable that the recommendation engine space is one where the benefits accrue to the individual. I see better recommendations for me—yeah, they probably use my data to improve everyone’s recommendations, but viscerally what I’m aware of is the direct improvement of my personal experience based on my personal behavior. As an industry we struggle more with building these design cues where the improvement of my experience is derived from the aggregate analysis of other people’s behavior. This is nowhere more obvious than in Search, where lots of other users’ clicks and searches over time enable a search engine to point me to the best answer today.
In all of these examples we can start to see that there is a real beauty in the data driven life. So what is the risk? I think the risk is that it becomes the hyper public life. We fear the day when aggregating data across contexts becomes so easy as to collapse all contexts onto one plane of existence, one which is visible for all to see. In practical terms the concern is as simple as the risk of re-identifying an individual from a series of search queries, or of a data broker amassing data from multiple service providers and collapsing it all into a single profile, then overlaying it with whatever we may have published ourselves via social networks and the web, and making it all available for sale to anyone. I don’t think this is necessarily an impractical concern, but I also think it is nothing close to a full picture of the landscape we’re talking about.
I took a late flight out to Boston Wednesday afternoon, and was trying frantically in the airport to download to my iPad a book I’d purchased for the flight. SFO, your free wifi while free (thank you for that!) let me down. I couldn’t get a signal, and when I did the data moved at an achingly slow pace. So I found myself on the plane with only my existing library, much of which I’ve already read. But I came across this book I’d purchased a while back, ‘How to Be Alone’ which is a collection of essays by Jonathan Franzen.
After reading a few of these, I stumbled on his essay, “Imperial Bedroom.” I was still a teenager when that essay was written and hadn’t yet overcome my general attitude toward technology (that it was a boys’ hobby, of course) so stumbling on this essay while en route to this workshop on publicity was a pleasant surprise, and an opportunity to glimpse how folks may have been thinking about these issues a decade ago. I imagine many in the room are quite familiar with it, but for those who aren’t let me offer an interpretation of his thesis: the problem is not a loss of privacy but an injection of too much private behavior into public spaces, where it erodes the quality of the public space.
Franzen says something interesting: “without shame there can be no distinction between public and private.”
(Note the definition of shame: “Emotional distress or humiliation caused by what may be perceived as wrong or foolish behavior.”)
That somehow makes some sense to me. Without having given this much more thought than a few hours after work yesterday allowed me the time to give, I’d posit that shame is tied to identity in a critical way. The shame one feels for wrong or foolish behavior may exist even if it is known to no one, but with the potential perceptions of an entire society weighing on your behavior there are a multitude of things one might feel shame about.
Recall Dog Poop Girl. Had she been unidentifiable, unrecorded, she may have felt shame—but in all likelihood nothing like she is rumored to have felt following the incident in which her identity and behavior were broadcast to a nation.
The data driven life is indeed a beautiful one, full of potential. If we can capture opportunities to really demonstrate the public good that arises from multiple individual contributions, to design that transparency into services directly, untold advances will be made. But there is also a risk that the data collected about an individual’s behavior is tied to their identity and published in a way the individual didn’t understand, expect or desire—a risk that the data driven life unexpectedly turns into a hyperpublic one.
A few closing thoughts. It strikes me, when I step back and look at these issues from this big picture perspective, that much of the solution that lies ahead of us rests on identity management. We need to enable users to be who they want to be, where they want to be—Alma Whitten put this quite well a few months ago.
It is my perception that many in the privacy community seem to have not quite given up on identity as a solution, but turned away from it for what seems a simple and obvious reason: theoretically, re-identification ought to become so trivial within a few years that the concept of having multiple identities online strikes some of us as an impossible future. Friday at Hyperpublic someone pointed to facial recognition as one example of the ways in which technology seems to be siphoning us into a single identity. I understand the theoretical direction folks are concerned about, and why they may be concerned about it—but I am not quite ready to give up on the idea that I can manage different facets of my identity across different contexts.
I mean this sincerely, and yet as I consider creating an OkCupid profile in the next few months, it has occurred to me that much of my public life can be aggregated so easily—if I’m to post a picture on my profile, perhaps instead of creating a pseudonymous username and revealing my identity to potential dates at my own pace, I simply ought to use my real name and allow the initial judgements to form based on my public identity.
Certainly there is no easy answer in this space, but as usual only questions. That’s what makes it interesting, I suppose!