Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for the reply. I will concede or defer to you in regards to PHI and HIPAA... it seems the philosophy behind HIPAA/PHI is very different than PII or GDPR. HIPPA is prescriptive. PII/GDPR are principle-based. HIPAA, it seems, has some text that it's not PHI if the risk is "very small" based on the opinion of someone with statistical expertise documents that it could be de-identified OR if the person avoids an explicit list of 18 things that it cares about (see items (A) through (R) on page 96 of https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/ad... ).

One of the first 17 things might be a surrogate key ("account number") in one's system but if you look through the others, the rest are things like name, SSN, biometrics, IP#s, etc, which are definitely not surrogate keys.

The "OR" language makes the statistical expertise (and "principles" of privacy) irrelevant if you avoid the 18 things; that avoidance forms a "safe harbor" of sorts so you don't have to do any heavy thinking/lifting.

The 18th ("(R)") element of what is considered PHI does seem to refer to surrogate keys but in a manner which creates a clear carve-out/safe-harbor for them not being PHI. That 18th form of PHI is "Any other unique identifying number, characteristic, or code, except as permitted by paragraph (c) of this section;"

But that paragraph (c) section seems indicate identifiers such as surrogate integer/guid keys kept within a system, as long as they are 1) not derived from an individual's information (ie like integer or UUID surrogate keys) and 2) which are maintained solely in the system are not considered as element-18-"R"-PHI:

"(c) Implementation specifications: Re-identification. A covered entity may assign a code or other means of record identification to allow information de-identified under this section to be re-identified by the covered entity, provided that: (1) Derivation. The code or other means of record identification is not derived from or related to information about the individual and is not otherwise capable of being translated so as to identify the individual; and (2) Security. The covered entity does not use or disclose the code or other means of record identification for any other purpose, and does not disclose the mechanism for re- identification."

By my reading, a surrogate key maintained within a system is thus clearly not PHI under HIPAA. I never looked at the details of HIPAA until today since it hasn't applied much to my data and have been focused more on PII/GDPR. I appreciate you describing the context of your remarks.



> SSN ... which are definitely not surrogate keys.

Surely SSN is a surrogate key? They are not naturally derived. The early ones were serial (i.e. an auto-incrementing field) and more recent ones are randomly generated (i.e. a UUID).


Conceptually, any information created or consumed outside your organization is not valid as part of a surrogate key, so SSNs are not a surrogate key. Furthermore, any time you reveal the primary key, that key can become the thing that people come to depend upon to find that database row, which leads to the possibility that someday, someone will have an important need for some primary keys to change, even if the primary key was supposed to be a surrogate key. The longer a database lives, the more likely that surrogate keys morph into natural keys. That's what happened to SSNs.


> Conceptually, any information created or consumed outside your organization is not valid as part of a surrogate key, so SSNs are not a surrogate key.

SSNs are created within the organization. Maybe not within your organization, but nobody is talking about you. They are a surrogate key.


SSN = social security number. What else does SSN stand for?


SSN is absolutely not a surrogate key. If you received a piece of information from an external source, it is data, not a surrogate key.

If you use data as a key, then that is a natural key, if you invent a value to use as an identifier, that is an artificial or surrogate key.

If an API provides you an ID for a record, that is data. If you use it as a key, that is also a natural key in your system.


> If you received a piece of information from an external source, it is data, not a surrogate key.

It may not be your surrogate key, but it is someone's!


Possibly, you don't actually know since it is external data.


We do know because the database owner has openly talked about how the keys are derived. That still doesn't make it a good key for your database, but I can assure you that the world doesn't revolve around you. It is someone's surrogate key – therefore it is a surrogate key.


You seem to be insisting on a semantic interpretation that makes the term less useful. A third party's surrogate key should not be considered a surrogate key when in your system. It doesn't matter how that key was generated, only that the semantic meaning of the key is outside of your control.


The other difference between HIPAA and GDPR is compliance rules. Under GDPR it is not a problem if all your employees who have access to your database have access to PII (since, from the perspective of GDPR, the customer has given consent to share their information with your organization as a whole). Sharing with third parties outside your organization is where you get into trouble.

But under HIPAA, even your own employees need to have a specific, documented justification for accessing customer PHI. If they can do their job just as well by only accessing a non-PHI dataset, you're required by law to only allow them access to a scrubbed, non-PHI dataset.


I don't think GDPR is that different in that regard, you also have to minimize access to PII within the organization. See Article 25:

> The controller shall implement appropriate technical and organisational measures for ensuring that, by default, only personal data which are necessary for each specific purpose of the processing are processed. That obligation applies to the amount of personal data collected, the extent of their processing, the period of their storage and their accessibility. In particular, such measures shall ensure that by default personal data are not made accessible without the individual's intervention to an indefinite number of natural persons.


GDPR is indeed different in that regard. And I'm not sure how your quote says otherwise. "The employees of this company" are not an indefinite number of persons.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: