I'm wondering about how to store language tag IDs (e.g. en-US).
A quick example:
Table l10n ( l10n_id SMALLINT, code VARCHAR(5) )
PK: l10n_id
Index on code
Table product ( product_id INT, ..language-neutral columns.. )
PK: product_id
Table product_l10n ( product_id INT, l10n_id SMALLINT, ..language-specific columns.. )
PK: product_id, l10n_id.
Indexes on product_id and l10n_id
If we use varchar IDs:
Table l10n ( l10n_id VARCHAR(5) )
PK: l10n_id
1. Using smallint IDs:
It requires to make a join with the language table:
SELECT *
FROM product a
LEFT JOIN product_l10n b ON b.id = a.id
LEFT JOIN l10n c ON c.id = b.id
WHERE c.code = 'en-US';
2. Using varchar IDs:
We don't have to use the l10n table in our queries:
SELECT *
FROM product a
LEFT JOIN product_l10n b ON (b.product_id = a.id AND b.l10n_id = 'en-US');
A smallint takes 2 bytes while a varchar(5) takes up to 6 bytes (in PostgreSQL, see the doc).
So I guess that all indexes (l10n pk, product_l10n pk, and product_l10n.l10n_id) will be bigger with varchar IDs. And therefore, the performances will decrease. Is that correct?
3. Conclusion:
Am I right in saying the more localized data we have, the less performant using varchar IDs is?
And therefore, using smallint IDs is a better design.