SQL Server VARCHAR(2) join performance vs. INT

views:

125

answers:

SQL Server VARCHAR(2) join performance vs. INT

I have a table of q-grams (2 letter strings). I need to join another table to this on the CHAR(2) field.

Would it be faster to convert these 2-grams into a number (e.g. aa=1, ab=2, ac=3) and convert the fields in both this table and the referencing table to an INT and join using an INT rather?

Kind regards

Peter

+1 A:

Probably not. And if you were, it would be better to use a smallint instead of an int. Also, I'm not sure it makes any sense to have a varchar(2) column. There is extra overhead in having varchar columns, and at most you are saving only a single byte.

How many rows of data will you have?

Randy Minder 2009-12-18 13:05:32

I'm talking a table with 3 million rows linked to around 150 million q-grams. So a large data set. – Peter 0 secs ago

Peter 2009-12-18 13:13:32

Randy, I made a mistake -- it is a CHAR(2) not a VARCHAR(2)

Peter 2009-12-18 13:15:08

+1 A:

NO, this will probably not be more performant, and in addition, you will have to have an additional table to store these mappings.

This will be an additional JOIN aswell.

Just remember to apply the correct indexes for your join columns.

astander 2009-12-18 13:07:37

I mean replacing the CHAR(2) column with an INT column, so no mapping table will be needed.

Peter 2009-12-18 13:16:03

NO, I still do not think you need to change it.

astander 2009-12-18 13:26:31

Will that be your bottle neck? Will it be worth the additional complexity?

That said, you might as well make it CHAR(2).

Alex Brasetvik 2009-12-18 13:08:54

Sorry my mistake - it is CHAR(2)

Peter 2009-12-18 13:11:39

I'm talking a table with 3 million rows linked to around 150 million q-grams. So a large data set.

Peter 2009-12-18 13:12:45

In my experience, changing the key for one join from a string (especially CHAR(2)) to an INT will not make a measurable difference. This is because most of your overhead is I/O instead of CPU.

Once you get into longer strings, (especially Unicode strings), you can start measuring a difference. But even then, I think it has more to do with increased I/O due to record size in the data and the indexes.

Darryl Peterson 2009-12-18 13:27:52

ansaurus

tags:

views:

answers:

SQL Server VARCHAR(2) join performance vs. INT

related questions