[We are delighted to run this piece by our friend and Berkman Center colleague Ryan Budish - eds.]
On Monday morning, those of us who work at Harvard found our phones buzzing with activity as we were warned to stay away from four evacuated university buildings because of bomb threat. For those of us who were near the Boston Marathon this spring, these kind of threats seem particularly scary. Thus, the fact that a Harvard sophomore appears to have created fear and distracted our first responders from real work, all to postpone a final exam, is particularly disgusting. So let me get this out there: I’m glad he was caught, confessed, and will be punished.
But what I want to talk about is the power of metadata. In defense of its bulk phone record collection program, the government has repeatedly insisted that the data it collects is “just metadata.” The implication is that metadata creates no invasion of privacy because it isn’t the content of the communication. In short, surveillance isn’t surveillance unless it captures the “what” of your communications; the “who”, “when”, and “where” doesn’t count.
This week a federal judge ruled that the NSA’s bulk data collection is likely unconstitutional. Judge Leon said that the government’s surveillance capacity is “almost-Orwellian” and that the bulk collection of metadata such as phone records is likely a privacy intrusion. But in the abstract it is sometimes hard to understand just how powerful metadata can be.
The arrest of Eldo Kim, the student who e-mailed the bomb threat, provides a good, if emotionally conflicting, example of the power of metadata. According to the FBI affidavit, Kim was using both Tor and Guerrilla Mail in order to send his threats anonymously.
Combined, these tools would allow someone to send an e-mail that would be very difficult to track. Tor is a service that hides your IP address. Normally the IP address that identifies your computer on the network is public and available to any website you visit and any recipient of your e-mails. Tor hides this information by relaying your communications between a series of other computers (called “Tor nodes”) so that when your communications reach their final destination, it looks like they came from a different IP address. Guerrilla Mail is a free e-mail program that lets users create temporary and anonymous e-mail addresses.
If Kim used these anonymization tools, how did the FBI find him? It turns out the “who”, “when”, and “where” (i.e., the metadata) of any online transaction is very important. Because it seemed likely that the hoaxer was a Harvard student, Harvard was able to look through their records to see whether anyone had used their network to connect to Tor (the “where”). Tor works by anonymizing your connections to your destination, but you have to connect to Tor in the first place. One theory as to how Harvard identified Kim is that because most Tor entry nodes are public (if they were private you wouldn’t know how to connect to Tor), if you own a network (as Harvard does), you can see who on your network is connected to the nodes, even if you can’t see the content that you sending. As it turned out, not many people were connected to Tor on Monday morning (the “when”). And because computers must be registered with a name and e-mail address in order to access the Harvard network, Harvard could identify the owners of any computer connected to Tor when the threat was sent (the “who”).
Kim’s anonymization tools worked as advertised – they hid his IP and e-mail addresses. But metadata is so powerful and so pervasive, that he still left a traceable trail. To be sure, Kim could have done some things differently that would have made it harder for the FBI to catch him. The fact that Kim used the most obvious network (the one that Harvard students use) made it easier to identify him; had he gone to Starbucks, he might have gotten away with it. It was also a mistake that he used a rather uncommon product. Had he used Gmail instead of software that few Harvard students use, the University would have seen thousands of connections to Google and wouldn’t have been able to quickly identify Kim (but a subpoena to Google would have eventually yielded his IP address). Conversely, if hundreds or thousands of people at Harvard used Tor, Harvard wouldn’t have known which connection was the offending one.
Kim’s story shows us how powerful metadata can be. Because it is so powerful, it can be a valid and useful tool for law enforcement and intelligence agencies. But this story should also make us suspicious of anyone who justifies data collection on the basis that it is “just metadata.”
Ryan Budish is a fellow at the Berkman Center for Internet & Society and the Director of Herdict, a project for collecting crowd-sourced data on where websites are inaccessible.