Thursday, April 25, 2013

Ouch!

One of the issues we have to address as security folks is protecting a person's privacy.   If you've ever dealt with Personal Health Information (PHI), you know that there are strict rules about what aspects of a person's identity must be protected when associated with medical data.

In what can only be described as an object lesson of how important this is, the folks at the Data Privacy Lab (at Harvard) conducted an interesting experiment - looking into how many folks in the Personal Genome Project they could identify just by birthdate, sex and zip code.

Amazingly, they identified 200 participants with 84% to 90% accuracy.  Let me repeat that for emphasis ... using just birthdate, zip and sex they were able to link 200 folks to their "anonymous" genome with good accuracy.  They basically matched data from the genome project with public voter registration data and other public data.

Here's a web site where they report their findings: http://dataprivacylab.org/projects/pgp/
The full report is at: http://dataprivacylab.org/projects/pgp/1021-1.pdf

Best of all, they have a web site where you can put in your birthdate, sex and zip, and they'll tell you how many folks match in their public records. (http://aboutmyinfo.org/)

I tried it for my info, and there's only one record which matches my info.  I live in a relatively small town (Boulder, CO) but still I was shocked.  It's a good thing I don't feel a need to hid my identity.

For reference, here's what HIPAA says about data that needs to be protected (thanks Wikipedia, http://en.wikipedia.org/wiki/Protected_health_information):

Under the US Health Insurance Portability and Accountability Act (HIPAA), PHI that is linked based on the following list of identifiers must be treated with special care.
  1. Names
  2. All geographical identifiers smaller than a state, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of the Census: the geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and [t]he initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000
  3. Dates (other than year) directly related to an individual
  4. Phone numbers
  5. Fax numbers
  6. Email addresses
  7. Social Security numbers
  8. Medical record numbers
  9. Health insurance beneficiary numbers
  10. Account numbers
  11. Certificate/license numbers
  12. Vehicle identifiers and serial numbers, including license plate numbers;
  13. Device identifiers and serial numbers;
  14. Web Uniform Resource Locators (URLs)
  15. Internet Protocol (IP) address numbers
  16. Biometric identifiers, including finger, retinal and voice prints
  17. Full face photographic images and any comparable images
  18. Any other unique identifying number, characteristic, or code except the unique code assigned by the investigator to code the data

No comments:

Post a Comment