Wide Open Data: NYC taxi dump catches strip club Johns
Open Data zealots rarely give an individual’s privacy a thought – it’s just another obstacle to be driven over in their desire to provoke a data-powered revolution. But a gigantic dump of journeys made by licensed New York City taxis gives a vivid reminder of the dangers of careless data drops.
Earlier this year a Freedom of Information request yielded details of 173 million trips made by the yellow cabs, weighing in at over 20GB uncompressed. It contained an MD5 hash of the driver ID (or medallion), precise GPS co-ordinates of pick up and drop off locations, passenger count and trip times.
If you’re in the UK, your personal health record continues to be harvested, regardless of the “pause” in the Care.data data-sharing exercise, and you can’t opt out of centralised GP data collection. (Your out-patient data escaped into the wild long ago). And this can be de-anonymised – it just takes a bit more work.
The problem with open data drops is twofold. The public is being scammed – as the data can be enormously valuable, but the zealots insist it be given away for free, or next to nothing. Secondly, there’s really no such thing as anonymity. Not that they care – this data Gravy Train has some momentum behind it. ®