views:

80

answers:

1

I'm wondering about how to store language tag IDs (e.g. en-US).

A quick example:

Table l10n ( l10n_id SMALLINT, code VARCHAR(5) )  
    PK: l10n_id
    Index on code  
Table product ( product_id INT, ..language-neutral columns.. )  
    PK: product_id
Table product_l10n ( product_id INT, l10n_id SMALLINT, ..language-specific columns.. )  
    PK: product_id, l10n_id.  
    Indexes on product_id and l10n_id

If we use varchar IDs:

Table l10n ( l10n_id VARCHAR(5) )  
    PK: l10n_id

1. Using smallint IDs:

It requires to make a join with the language table:

    SELECT *
    FROM product a
        LEFT JOIN product_l10n b ON b.id = a.id
        LEFT JOIN l10n c ON c.id = b.id
    WHERE c.code = 'en-US';

2. Using varchar IDs:

We don't have to use the l10n table in our queries:

    SELECT *
    FROM product a
        LEFT JOIN product_l10n b ON (b.product_id = a.id AND b.l10n_id = 'en-US');

A smallint takes 2 bytes while a varchar(5) takes up to 6 bytes (in PostgreSQL, see the doc).
So I guess that all indexes (l10n pk, product_l10n pk, and product_l10n.l10n_id) will be bigger with varchar IDs. And therefore, the performances will decrease. Is that correct?

3. Conclusion:

Am I right in saying the more localized data we have, the less performant using varchar IDs is?
And therefore, using smallint IDs is a better design.

+1  A: 

When storing VARCHARs, collations come into play, making index comparisons a trifle less efficient (at least in MySQL).

Also, in MySQL an key miss on a VARCHAR is much more expensive than an key hit.

See this article in my blog for performance:

On the other side, storing the codes in a separate table will require an extra JOIN that will kill all performance benefits.

So in MySQL it's probably worth keeping codes as TINYINTs, while in other systems it will be better to keep them as VARCHAR's

Quassnoi
Since language code will not be very dynamic, it is plausible for the SQL to be written without the join; you collect the relevant language code once from the table, and thereafter provide it as a parameter to the queries...well, that avoids the joins. Gut feel: the joins should not be problematic if your DBMS has an optimizer worthy of the name.
Jonathan Leffler