tags:

views:

439

answers:

2

I've got a DB table where we store a lot of MD5 hashes (and yes I know that they aren't 100% unique...) where we have a lot of comparison queries against those strings. This table can become quite large with over 5M rows.

My question is this: Is it wise to keep the data as hexadecimal strings or should I convert the hex to binary or decimals for better querying?

+1  A: 

Binary is likely to be faster, since with text you're using 8 bits (a full character) to encode 4 bits of data. But I doubt you'll really notice much if any difference.

Where I'm at we have a very similar table. It holds dictation texts from doctors for billing purposes in a text column (still on sql server 2000). We're approaching four million records, and we need to be able to check for duplicates, where the doctor dictated the exact same thing twice for validation and compliance purposes. A dictation can run several pages, so we also have a hash column that's populated on insert via a trigger. The column is a char(32) type.

Joel Coehoorn
+1  A: 

Binary data is a bummer to work with manually or if you have to dump your data to a text file or whatnot.

Just put an index on the hash column and you should be fine.

kurosch