views:

267

answers:

6

I have 2 columns: "string" "int"
I set both as primary keys for unique combination.

Would searching based on the string column be poorer than normalizing further the string column and having 2 int columns instead?

+6  A: 

Do you expect to have lots of duplicate strings? If no, then normalizing will yield nothing. If yes, then indexing becomes less effective the more duplication you have. (because all items with the same index must be retrieved)

If your database engine supports it, you could simply set the string as a clustered index, in which case similar strings will be stored next to each other.

And if you already have an index on both columns, and are comparing both of them, you are receiving the full benefit of the index.

Joe Koberg
how exactly to set as clustered index?
zsharp
MSSQL: CREATE CLUSTERED INDEX foo_string_cidx ON foo_table(foo_string);
Joe Koberg
+1  A: 

This actually sounds quite easy to test.

You could set up the table both ways, generate 100,000 random rows, run some queries against both, and see which is quicker.

Blorgbeard
+2  A: 

If you actually need the string value, then the answer is no.

Comparing a string in the DB is less expensive than comparing an int, then looking up the string from the int.

As for duplication of strings, the more duplication you have, then the less storage you would need if they were denormalized. It would still be slower though.

If you're looking for performance over storage, then stick with the string,int PK.

Darren Clark
A: 

If you are using Oracle, you may wish to explore making this an index organised table with COMPRESS 1.

From the Oracle docs:

Specify COMPRESS to enable key compression, which eliminates repeated occurrence of primary key column values in index-organized tables. Use integer to specify the prefix length, which is the number of prefix columns to compress.

WW
A: 

Indexes make fast searching possible. But remember the database can only use the index if it knows all the most significant fields. For example, if you have an index on:

int, string

Then the database can use the index to search for an int, or for a combination of int and string. But it can't use it to search for a string.

If you create two separate indexes:

index 1: int
index 2: string

Then the database can search efficiently for int and/or string.

Whether you normalize the table doesn't matter half as much as the choice of indexes. Normalization typically slows down searching, because it forces a lookup to a different table.

Andomar
+1  A: 

The table is already normalized.

Changing a string to an int is not further normalization.

This may seem picky, but the word "normalized" is being over used to the point where it's becoming menaingless.

Walter Mitty