views:

79

answers:

3

Suppose you have a few, quite large (100k+) objects in available and can provide this data (e.g. name) in 20+ languages. What is an efficient way to store/handle this data in a SQL database.

The obvious way to do that looks like this - however, are there other ways which make more sense? I'm a bit worried about performance.

  CREATE TABLE "object" (
      "id" serial NOT NULL PRIMARY KEY
  );                                  
  CREATE TABLE "object_name" (
      "object_id" integer NOT NULL REFERENCES "object" ("id")
      "lang" varchar(5) NOT NULL,
      "name" varchar(50) NOT NULL 
  );

As for usage, the use will only select one language and that will result in potentially large joins over the object_name table.

Premature optimization or not, I'm interested in other approaches, if only gain some peace of mind, that the obvious solution isn't a very stupid one.

To clarify the actual model is way more complicated. That's just the pattern identified so far.

+2  A: 

In my own projects, I don't translate at the DB level. I let the user (or the OS) give me a lang code and then I load all the texts in one go into a hash. The DB then sends me IDs for that hash and I translate the texts the moment I display them somewhere.

Note that my IDs are strings, too. That way, you can see which text you're using (compare "USER" with "136" -- who knows what "136" might mean in the UI without looking into the DB?).

[EDIT] If you can't translate at the UI level, then your DB design is the best you can aim for. It's as small as possible, easy to index and joins don't take a lot.

If you want to take it one step further and you can generate the SQL queries at the app level, you can consider to create views (one per language) and then use the views in the joins which would give you a way to avoid the two-column-join. But I doubt that such a complex approach will have a positive ROI.

Aaron Digulla
That works fine for a UI with a limited number of strings, but he wanted something with 100k+ objects/strings, I wouldn't keep those in memory...
Wim
I think he means a tranlation table as mapping msg_id to translation. It's specifically not something I can use gettext for.
phoku
@Wim: I still feel I'd avoid to do that at the DB level. If there are so many objects, I'd use a cache at the UI level.
Aaron Digulla
It depends on the usage pattern, if there's not much re-use then caches won't help. But its worth to investigate in any case.
Wim
+3  A: 

If you have a combined key on (object_id, lang) there shouldn't be any joins, just an O(1) lookup, right? (Try with EXPLAIN SELECT to be sure)

Wim
Ah, that's right. Good point.
phoku
A: 

Have you considered using multiple tables, one for each language?

It will cost a bit more in terms of coding complexity, but you will be loading/accessing only one table per language, in which metadata will be smaller and therefore more time efficient (possibly also space-wise, as you won't have a "lang" variable for each row)

Also, if you really want one-table-to-rule-them-all, you can create a view and join them :)

ptor
Why not add some fields to the very same table, then? If I'm going to 'hardcode' the languages anyway?
phoku