views:

96

answers:

4

I was asked this before with slight different with current question. but did not got the answer I was looking into.

My question is do I need to store md5($url) in unique index in MySQL?? I have seen this in some code actually I don't remember..this is a large database with more than 5 million urls and the indexing is done by calling urls.

Any ideas?

A: 

are you saying that the url is called as such:

www.yourdomain.com?id=89ce9250e9f469c9d1816e1cc0fb47a1

and then the id (89ce9250e9f469c9d1816e1cc0fb47a1 which is an md5() of the real url querystring) is looked up from the database to resolve the actual url which could be:

www.yourdomain.com?user=23&location=5&eventtype=23&year=2010

Is this the kind of usage you're referring to??

jim

jim
Well md5 is one-way so that would not really work with md5 but the idea is, IMO the same that mathew wants
DrColossos
Dr - yes, i'm aware of the md5 being one way. my thinking was that he'd have a unique column that stored the md5 of the url, which looked up the actual value from a secondary column.. does that make sense?? not sure of course why he'd want to do this but perhaps an update to the question will answer that :)
jim
A: 

Like the others, I can't quite figure out what you're asking about MD5 and URLs so here's just my interpretation.

If you already have a unique key constraint on the column you don't need to use an MD5 checksum.

In fact, you shouldn't use a hashing algorithm to check the uniqueness of anything URLs because of collisions.

BoltClock
You can't exactly tar all hash algorithms with the same brush. There are plenty of other hashing algorithms where the chance of a collision is miniscule. In SHA-1 for example, a database with 10^18 entries has a chance of about 1 in 0.0000000000003 of a clash. Is that really worth worrying about? SEE: http://stackoverflow.com/questions/297960/hash-collision-what-are-the-chances
Kieran Allen
@Kieran Sure, the chance is minuscule, but not hashing the URLs at all has **zero** chance of collisions.
deceze
+1  A: 

Some sites uses hashing for urls in the database because they use hashes in urls say for user redirect to external url. I can't see any reason to do this if this is not the case.

ivan73
+1  A: 

I don't think you should hash your URLs. The only plausible reason would be to save space (if most of the URLs are larger than 32 chars) at the expense of increased risk of collisions.

What you should do is normalize the URLs.

Alix Axel