tags:

views:

236

answers:

5

So, imagine a mysql table with a few simple columns, an auto increment, and a hash (varchar, UNIQUE).

Is it possible to give mysql a query that will add a column, and generate a unique hash without multiple queries?

Currently, the only way I can think of to achieve this is with a while, which I worry would become more and more processor intensive the more entries were in the db.

Here's some pseudo-php, obviously untested, but gets the general idea across:

while(!query("INSERT INTO table (hash) VALUES (".generate_hash().");")){
    //found conflict, try again.
}

In the above example, the hash column would be UNIQUE, and so the query would fail. The problem is, say there's 500,000 entries in the db and I'm working off of a base36 hash generator, with 4 characters. The likelyhood of a conflict would be almost 1 in 3, and I definitely can't be running 160,000 queries. In fact, any more than 5 I would consider unacceptable.

So, can I do this with pure SQL? I would need to generate a base62, 6 char string (like: "j8Du7X", chars a-z, A-Z, and 0-9), and either update the last_insert_id with it, or even better, generate it during the insert.

I can handle basic CRUD with MySQL, but even JOINs are a little outside of my MySQL comfort zone, so excuse my ignorance if this is cake.

Any ideas? I'd prefer to use either pure MySQL or PHP & MySQL, but hell, if another language can get this done cleanly, I'd build a script and AJAX it too.

Thanks!

+1  A: 

What is this hash a hash of? It seems like you just want a randomly generated unique VARCHAR column? What's wrong with the auto increment?

Anyway, you should just use a bigger hash - find an MD5 function - (if you're actually hashing something), or a UUID generator with more than 4 characters, and yes, you could use a while loop, but just generate a big enough one so that conflicts are incredibly unlikely

wsorenson
I guess I should specify, I definitely need this number to be 6 chars, and assuming there won't be a conflict wouldn't work, as I definitely need a guarantee there won't be any conflict. While I'd be comfortable using and MD5 without validation, Even though 36 to the 6th is HUGE, it's not huge enough to blindly insert...
Jesse
+2  A: 

If your heart is set on using base-36 4 character hashes (hashspace is only 1679616), you could probably pre-generate a table of hashes that aren't already in the other table. Then finding a unique hash would be as simple as moving it from the "unused table" to the "used table" which is O(1).

If your table is conceivably 1/3 full you might want to consider expanding your hashspace since it will probably fill up in your lifetime. Once the space is full you will no longer be able to find unique hashes no matter what algorithm you use.

Kendall Hopkins
Because of the nature of the app, I need it to be specifically base62 6 chars. I figure I will pre-generate the table if need be, but even though mysql is fast, running through 50 billion entries is still not the solution I was looking for :( I don't anticipate reaching that many entries, but I'd rather be safe, as a conflict would be catastrophic.
Jesse
You could pre-generate out ~1 million hashes and pull from those like I have described. Then/if those get low, you can generate more. That way since the generation of the hashes are done before hand, you can ensure O(1). But to be honest, you'll probably never have a conflict. With 1 million hashes you still only have a 0.0017% of a collision, and write some code to deal with the db error of a collision (if it ever happens).
Kendall Hopkins
A: 

Going with zneaks comment, why don't you use an autoincrement column? save the hash in another (non unique) field, and concatenate the id to it (dynamically). So you give a user [hash][id]. You can parse it out in pure sql using the substring functions.

Since you have to have the hash, the user can't look at other records by incrementing the id.

Byron Whitlock
As I responded to Toby, autoinc would definitely be the cleanest way of doing this, the hash isn't there for security as much as it is there to be referenced by. This will be on a URL string, and I would prefer for it to have an option to be human-readable. Unfortunately, locking the db into an auto inc would make that difficult, as I would have to key the hash, and add items out of order. That's definitely the closest so far, and I'll do that if I can't figure out anything else.
Jesse
+1  A: 

As others have suggested whats wrong with an autoinc field? If you want an alpha numeric value then you could simply do a simple conversion from int to a alphanumeric string in base 36. This could be implemented in almost any language.

Toby Allen
I'd like to use an autoinc field, and this seems like the best option (I'd have to convert to base62), but as a preference, I'd like the numbers to be random. This also eliminates the ability to add any entries that aren't in order, which is something I anticipate wanting to implement. This is probably the most feasible, option, but still leaves something wanted.
Jesse
A: 

So, just in case someone runs across a similar issue, I'm using a UNIQUE field, I'll be using a php hash function to insert the hashes, if it comes back with an error, I'll try again. Hopefully because of the low likelyhood of conflict, it won't get slow.

Jesse