part of the Blog News Channel

Yes, Your Search History Identifies You

The New York Times writes about Thelma Arnold, a 62-year-old Georgia widow who was identified, despite a random anonymizing number, within AOL’s accidentally released search history research records. AOL had been running a three month study of random user’s search histories, without their consent or knowledge, and assigning each user a random identifier, but the queries of user No. 4417749 made it obvious that it was Ms. Arnold’s AOL search history.

Ms. Arnold says she loves online research, but the disclosure of her searches has left her disillusioned. In response, she plans to drop her AOL subscription. “We all have a right to privacy,” she said. “Nobody should have found this all out.”

Several bloggers claimed yesterday to have identified other AOL users by examining data, while others hunted for particularly entertaining or shocking search histories. Some programmers made this easier by setting up Web sites that let people search the database of searches.

This raises so many questions about search history, which is a hot topic these days. You could easily identify most people through their search history. For example, the sheer number of times I run a vanity search or search on my own websites would identify me. From there, you could publicize the fact that I may have ran a lot of “dirty” queries, including searches on unsecured MP3 servers, porn on Google Base, security exploits, and not realize those were all searches conducted while writing for this blog.

And, hell, maybe I just wanted to see some naughty bits…

Anyway, the point is that grouping searches by user, even dropping the user’s real name guarantees the user can be identified from their search data, provided there is a large enough number of searches. You can safely state that it will never be possible to release search data, either to the public or the government, and claim that the data is private, anonymous, and not harmful.

Companies like Google, Microsoft, Yahoo, AOL, Ask and other need to realize that it will never be okay to release our data, even without our names. If this paragraph from the article is true:

AOL removed the search data from its site over the weekend and apologized for its release, saying it was an unauthorized move by a team that had hoped it would benefit academic researchers.

Then a good move might be to publicly fire the person who made that decision. User privacy needs to be respected more, and if someone needs to lose their job and be made an example, in order to send a message to workers throughout the industry, then do it. Beyond that, research and data collection methods need to change in light of this situation. If anyone is ever going to release search history data again, how’s this for a rule: No more than one query released per user. With one query per user, you’ll never know whose private life you’re reading.
(via Biz)

August 9th, 2006 Posted by Nathan Weinberg | History, Controversy, AOL, Search, General | 2 comments

Hosting sponsored by GoDaddy


  1. love your idea of just firing the son of a bitch at aol. now that’s how you show you’re serious about privacy. people need to get as serious about privacy as they’ve gotten about security. they’re separate issues but one in the same.

    Comment by aaron | August 9, 2006

  2. For most research purposes, modified or abridged search histories pose great problems. Imagine linguists who want to examine how the use of language in queries varies with time, for example, when users start to omit stop words like “the”, “and”, “in” etc. in queries. You can’t use queries given to you on a “release only those queries that were placed 42 milliseconds after the full second” basis to answer questions like this; you need complete search histories.

    Most analysis methods in the “non-hard” sciences like linguistics rely heavily on statistics, requiring large datasets. So, releasing single queries does not really help the scientific community.

    This is not to say that I think what AOL did was ok. I myself believe in the importance of complete online privacy, but your post seems to oversimplify the topic. This “accident” should be followed by a all-parties-included debate on how to deal with this problem in the future.

    Comment by Jürgen | August 10, 2006

Leave a comment