ansaurus

Question

Answer 1

A:

SHA1 or MD5 is probably enough for your needs. In practice, the probability of a collision is so small that it will likely never happen.

Ultimately, though, it all depends upon your required level of security. Do also keep in mind that longer hashes are both more expensive to compute and require more storage space.

Ben Herila 2010-06-22 00:42:34

Answer 2

+3 A:

The time taken to create the hash is not important, and as long as your database is properly indexed, the storage method should not be a major factor either.

However, the hash has to be transmitted with the client's request every time, frequently as a cookie. Large cookies can add a small amount of additional time to each request. See Yahoo!'s page performance best practices for more information. Smaller cookies, thus a smaller hash, have benefits.

Overall, large hash functions are probably not justified. For their limited scope, good old md5 and sha1 are probably just fine as the source behind a session token.

Charles 2010-06-22 00:44:54

The hashes being stored can last anywhere from hours to weeks. For the purpose of one time use tokens, then a smaller hash will be used.

Andre 2010-06-22 15:08:00

If your *session* hashes are sticking around for weeks, your GC is broken and/or you have insane users. Edge cases.

Charles 2010-06-22 15:23:24

The hash may hang around for weeks because user preferences are stored in a db and linked to the hash stored in a cookie, which may have an expiration of as little as 24 hours -- usually though cookies are set to expire 21 days from last visit. Within that period several thousand users might have visited the site. By the way when I refer to session I'm not referring to session set by PHP that generally expires when the browser is closed. I hope this clears up any mix-ups.

Andre 2010-06-22 18:21:15

Why are you doing that instead of attaching preferences to a user login and prompting users to log in?

Charles 2010-06-22 19:08:38

Because not all preferences require that a user log in or for that matter be registered.

Andre 2010-06-22 20:52:48

Answer 3

A:

The article times out when I try to read it, but I can't think of a good reason to use a hash as a session identifier. Session identifiers should be unpredictable; given the title of the article, it sounds like the authors acknowledge that principle. Then, why not use a cryptographic random number generator to produce session identifiers?

A hash takes input, and if that input is predictable, so is the hash, and that's bad.

erickson 2010-06-22 00:51:29

I've updated my original post to include the code for how the hash input is generated.

Andre 2010-06-22 15:09:42

Answer 4

A:

Well, the answer is, it depends. It depends on how many you need to generate and what is the maximum amount of Ids that could live simultaneously (I suppose session Ids are throwaway, after a time period).

For instance, if you used 32 bits, then if you generate 1+ 2³² ids you are guaranteed a collision. You need to take into consideration how many you will be generating! Just talking about collision probability of a particular hashing function isn't sufficient, unless you also consider the number of hashes you generate with that.

I suggest you read up on Birthday Paradox and also read Birthday Attack.

Birthday paradox basically says that if you uniformly select random numbers from 1 to n, doing this ~sqrt(n) times gives you at least 50% chance of a collision. So basically if you take any 23 random people, there is a good chance some two have the same birthday (one of 365 days)!

So if the session Id you assigned was the birthday of the person, with 23 people you have at least 50% chance of a collision!

For 128 bit (uniform) hashes, with 8.2x10¹¹ (I guess 820 billion) ids, collision probability is around 10^-15. There is an informative table in the Birthday Paradox wiki page above with these values.

128 bits would very likely be good enough, but you could probably do it with fewer bits, depending on your needs.

btw, since you are storing it in a db, you could always query the db to check if a newly generated Id has been assigned and regenerate if already in use. Or is your goal to avoid the db query? Better yet, use sequence numbers provided by the db (if users are not malicious, that is :-)).

Moron 2010-06-22 07:38:03

@Andre: Was this of any use to you? If not, I will just delete it.

Moron 2010-06-22 21:11:53

Yes it was, I had previously read about birthday paradox but wasn't very clear in my initial post. I think my question should have spoke less about the hash itself and more about the size of the field in my db with respect to indexing. So if your able to offer any suggestions on the sensibility of a 64 char field, 96 char field or 128 char field I would be really greatful.

Andre 2010-06-23 04:11:08

@ANdre: 128 bits = 16 characters and that is good enough for more than a billion simultaneous IDs according to that wiki page.

Moron 2010-06-23 04:16:43

Answer 5

+1 A:

Yes, size matters.

If it's too short, you run the risk of collisions. You also make it practical for an attacker to find someone else's session by brute-force attack.

Being too long matters less, but every byte of the session ID has to be transferred from the browser to the server with every request, so if you're really optimising things, you may not want an ID that's too long.

You don't have to use all the bits of a hash algorithm, though - there's nothing stopping you from using something like Whirlpool, then only taking the first 128 bits (32 characters in hex). Practically speaking, 128 bits is a good lower bound on length, too.

As erickson points out, though, using a hash is a bit odd. Unless you have at least as much entropy as input as the length of the ID you're using, you're vulnerable to attacks that guess the input to your hash.

Nick Johnson 2010-06-22 09:56:00

If I'm going to only take the first 32 characters, wouldn't it then make sense to just go ahead and use md5? I say this because wouldn't truncating the whirlpool hash increase the chance of collision because the likely hood of the first 32 characters being the same in any number of hashes be higher than md5. This assumption is made because md5 natively generates a 32 character hex as opposed to the 128 characters from a whirlpool hash.I could be totally off in this regard.

Andre 2010-06-22 15:19:30

There are two distinct causes of 'weakness' in a hash. There's the construction of the hash itself, and there's the length of the hash. MD5 has been repeatedly broken on the first count, but 128 bits remains a sufficiently large search space to make finding a collision by brute-force impractical. For any secure hash, you can take any arbitrary subset of it, and it will be as good as any other subset of the same length. If a substring of a large hash were weaker than the complete output of a shorter hash, that would mean the larger hash was broken, because its output was predictable in part.

Nick Johnson 2010-06-22 16:15:08

Thanks for clearing that up.

Andre 2010-06-22 18:46:33

ansaurus

tags:

views:

answers:

Session hash does size matter?

related questions