views:

65

answers:

3

I need to know the cost of succeeding with a Preimage attack ("In cryptography, a preimage attack on a cryptographic hash is an attempt to find a message that has a specific hash value.", Wikipedia).

The message I want to hash consists of six digits (the date of birth), then four random digits. This is a social security number.

Is there also a possibility to hash something using a specific password. This would introduce another layer of security as one would have to know the password in order to produce the same hash values for a message.

I am thinking about using SHA-2.

+1  A: 

I'm unsure what your question is exactly, but to make your encrypted value more secure, use salt values.

Edit: I think you are sort of describing salt values in your question.

Tom Gullen
"four random digits" looks like salt
Andrey
The message I want to hash is a social security number, as this is sensitive data, and should not be stored.
David
You should definatly hash it alongside a salt value then, as such a restricted lexicon as a social security number would be extremely vulnerable to a dictionary attack.
Tom Gullen
+3  A: 

If you want to know how expensive it is to find a preimage for the string you're describing, you need to figure out how many possible strings there are. Since the first 6 digits are a date of birth, their value is even more restricted than the naive assumption of 10^6 - we have an upper bound of 366*100 (every day of the year, plus the two digit year).

The remaining 4 'random' digits permit another 10^4 possibilities, giving a total number of distinct hashes of 366 * 100 * 10^4 = 366,000,000 hashes.

With that few possibilities, it ought to be possible to find a preimage in a fraction of a second on a modern computer - or, for that matter, to build a lookup table for every possible hash.

Using a salt, as Tom suggests, will make a lookup table impractical, but with such a restricted range of valid values, a brute force attack is still eminently practical, so it alone is not sufficient to make the attack impractical.

One way to make things more expensive is to use iterative hashing - that is, hash the hash, and hash that, repeatedly. You have to do a lot less hashing than your attacker does, so increases in cost affect them more than they do you. This is still likely to be only a stopgap given the small search space, however.

As far as "using a password" goes, it sounds like you're looking for an HMAC - a construction that uses a hash, but can only be verified if you have the key. If you can keep the key secret - no easy task if you're assuming the hashes can only be obtained if your system is compromised in the first place - this is a practical system.

Edit: Okay, so 'fractions of a second' may have been a slight exaggeration, at least with my trivial Python test. It's still perfectly tractable to bruteforce on a single computer in a short timeframe, however.

Nick Johnson
I'm not familiar with social security #'s in USA, but I imagine them to be fairly easy to generate every possible permutation of them. This creates a unique problem for this question because given a set of all possible security numbers (which I assume is easily calculable) and given a database of hashed numbers and salts, it would be reasonably easy for a home computer to calculate all the social security numbers for each record. Multiple hash layers would help definatly, but the data we are storing makes this problem a bit harder and I'm not sure what the best way to store them would be.
Tom Gullen
Also multiple layers of hashing starts to become security through obscurity. It's an interesting problem and I'm not sure how I would go about storing data such as this.
Tom Gullen
If you store a hashed password in the user records, you could use the raw password as the salt for the social security number. This would then mean the social security # could not be derived unless the raw password for the account was known. (However you will run into problems when users wish to change their password.)
Tom Gullen
@Tom It's not a unique problem: ATM PINs suffer the exact same issue, only with an even more restricted search space. Salts do make assembling a database impractical - just choose a salt that's long enough for the purpose. And iterated hashing isn't security through obscurity - no obscurity is involved.
Nick Johnson
Finally, why cook up exotic salting schemes? They're certain to be no more secure (and probably less so) than using a purely random per-value salt. As I've already pointed out, though, with a keyspace so small, salting alone will not suffice.
Nick Johnson
+2  A: 

SHA-2, salts, preimage atttacks, brute forcing a restricted, 6-digit number - man it would be awesome if we have a dial we could turn that would let us adjust the security. Something like this:

Time to compute a hash of an input:
 SHA-2, salted                                Better security!
  |                                            |
 \|/                                          \|/
 |-----------------------------------------------------|
.01 seconds                                           3 seconds

If we could do this, your application, when verifying that the user entered data matches what you have hashed, would in fact be a few seconds slower.

But imagine being the attacker!

Awesome, he's hashing stuff using a salt, but there's only 366,000,000 possible hashes, I'm gonna blaze through this at 10,000 a second and finish in ~10 hours!

Wait, what's going on! I can only do 1 every 2.5 seconds?! This is going to take me 29 years!!

That would be awesome, wouldn't it?

Sure would.

I present unto you: scrypt and bcrypt. They give you that dial. Want to spend a whole minute hashing a password? They can do that. (Just make sure you remember the salt!)

Tom Ritter
A minute today. 30 seconds in 18 months, and 15 seconds in 36. ;) Thanks for linking to the specific schemes, though.
Nick Johnson
Actually password hashing schemes have been doing this for years (hashing 500 times so it costs more for the attacker). Unfortunately things like rainbow tables (hellman time memory tradeoffs) negate a lot of this computation on the attackers side as they build lookup tables by running the hash on it's own output n times.
e5