views:

125

answers:

4

Hi

I have a table of q-grams (2 letter strings). I need to join another table to this on the CHAR(2) field.

Would it be faster to convert these 2-grams into a number (e.g. aa=1, ab=2, ac=3) and convert the fields in both this table and the referencing table to an INT and join using an INT rather?

Kind regards

Peter

+1  A: 

Probably not. And if you were, it would be better to use a smallint instead of an int. Also, I'm not sure it makes any sense to have a varchar(2) column. There is extra overhead in having varchar columns, and at most you are saving only a single byte.

How many rows of data will you have?

Randy Minder
I'm talking a table with 3 million rows linked to around 150 million q-grams. So a large data set. – Peter 0 secs ago
Peter
Randy, I made a mistake -- it is a CHAR(2) not a VARCHAR(2)
Peter
+1  A: 

NO, this will probably not be more performant, and in addition, you will have to have an additional table to store these mappings.

This will be an additional JOIN aswell.

Just remember to apply the correct indexes for your join columns.

astander
I mean replacing the CHAR(2) column with an INT column, so no mapping table will be needed.
Peter
NO, I still do not think you need to change it.
astander
A: 

Will that be your bottle neck? Will it be worth the additional complexity?

That said, you might as well make it CHAR(2).

Alex Brasetvik
Sorry my mistake - it is CHAR(2)
Peter
I'm talking a table with 3 million rows linked to around 150 million q-grams. So a large data set.
Peter
A: 

In my experience, changing the key for one join from a string (especially CHAR(2)) to an INT will not make a measurable difference. This is because most of your overhead is I/O instead of CPU.

Once you get into longer strings, (especially Unicode strings), you can start measuring a difference. But even then, I think it has more to do with increased I/O due to record size in the data and the indexes.

Darryl Peterson