views:

130

answers:

3

I am interested in hearing about enterprise solutions for SSN handling. (I looked pretty hard for any pre-existing post on SO, including reviewing the terriffic SO automated "Related Questions" list, and did not find anything, so hopefully this is not a repeat.)

First, I think it is important to enumerate the reasons systems/databases use SSNs: (note—these are reasons for de facto current state—I understand that many of them are not good reasons)

  1. Required for Interaction with External Entities. This is the most valid case—where external entities your system interfaces with require an SSN. This would typically be government, tax and financial.

  2. SSN is used to ensure system-wide uniqueness.

  3. SSN has become the default foreign key used internally within the enterprise, to perform cross-system joins.

  4. SSN is used for user authentication (e.g., log-on)

The enterprise solution that seems optimum to me is to create a single SSN repository that is accessed by all applications needing to look up SSN info. This repository substitutes a globally unique, random 9-digit number (ASN) for the true SSN. I see many benefits to this approach. First of all, it is obviously highly backwards-compatible—all your systems "just" have to go through a major, synchronized, one-time data-cleansing exercise, where they replace the real SSN with the alternate ASN. Also, it is centralized, so it minimizes the scope for inspection and compliance. (Obviously, as a negative, it also creates a single point of failure.)

This approach would solve issues 2 and 3, without ever requiring lookups to get the real SSN.

For issue #1, authorized systems could provide an ASN, and be returned the real SSN. This would of course be done over secure connections, and the requesting systems would never persist the full SSN. Also, if the requesting system only needs the last 4 digits of the SSN, then that is all that would ever be passed.

Issue #4 could be handled the same way as issue #1, though obviously the best thing would be to move away from having users supply an SSN for log-on.

There are a couple of papers on this:

A: 

It should be noted that SSNs are PII, but are not private. SSNs are public information that be easily acquired from numerous sources even online. That said if SSNs are the basis of your DB primary key you have a severe security problem in your logic. If this problem is evident at a large enterprise then I would stop what you are doing and recommend a massive data migration RIGHT NOW.

As far as protection goes SSNs are PII that is both unique and small in payload, so I would protect that form of data no differently than a password for one time authentication. The last four of a SSNs is frequently used for verification or non-unique identification as it is highly unique when coupled with another data attribute and is not PII on its own. That said the last four of a SSN can be replicated in your DB for open alternative use.

A: 

I have come across a company, Voltage, that supplies a product which performs "format preserving encryption" (FPE). This substitutes an arbitrary, reversibly encrypted 9-digit number for the real SSN (in the example of SSN). Just in the early stages of looking into their technical marketing collateral...

Erik Neu
A: 

I have found a trove of great information at the Securosis site/blog. In particular, this white paper does a great job of summarizing, comparing and contrasting database encryption and tokenization. It is more focused on the credit card (PCI) industry, but it is also helpful for my SSN purpose.

Erik Neu