January 24, 2006
One part of the media coverage of this Google v. DOJ story that's unsatisfying to anyone who is familiar with IP networks is that an IP address doesn't necessarily uniquely identify something the way most people think it does. They can be dynamically assigned and therefore change regularly (though there is certainly no reason to think that ISPs aren't keeping track of IP assignment history). With the advent of NAT and private IP networks, an IP address is less likely than ever to even uniquely identify a single computer: there could be many Internet-accessing devices behind a router with a single IP address, which is certainly the case in many home and small business networks where the scarcity of available public IP addresses make it infeasible and an administrative burden to try and assign numbers to each machine. Think of a coffee shop with a WiFi access point: each of those macchiato-sipping laptop users are known by the rest of the Internet by the same IP address. It's far more likely that web browser cookies, tracked across many sites with sharing agreements and usually tied to a login session where a user has provided information that could ultimately be traced back to them, would yield interesting, per-surfer metrics.
But there are plenty homes out there with a single PC and connection to the Internet, so why even bother with storing IP addresses? Once a cursory examination of them is done -- for instance, country of origin, which can easily be discovered by widely available tools -- run them through your encryption scheme and toss the originals. Then if you ever do get in a situation where you're forced to hand over the data, you can at least do it secure in the knowledge that you're not compromising your user's privacy. You still have problems, just one less one on your conscience.
* The information space of a typical hash is 128 bits, or 2 to the power of 128, or 3.4e38, or an extremely large number of possible outcomes. So while "collisions" -- two different inputs that yield the same output hash -- can happen, and have, in the case of the MD5 algorithm, the odds of them occurring are infinitesimally small, and in any case would not diminish the practical utility of day-to-day use of such hashes; that is, until quantum computers get their hands on them, but that is another, terrifying matter.