ansaurus

Question

Answer 1

+1 A:

You have a design decision to make, where the options are:

Use a single index, where each document has a field per each language it uses, or
Use M indexes, M being the number of languages in the corpus.

If you use the multi-index approach, it will be easier to restrict search to a specific language or set of languages - just search the indexes for these languages, not using the other languages. Also, sorting by language becomes easier. Therefore, if you do not have an "AND" search that requires keywords from different languages appear in the same document, I would suggest the M-index approach.

Based on your example, I assume that the part of the documents not specially tagged is in English. If this is so, you can add the document text to the English index as a separate field; The other indexes need only store a document id, which will make them lighter.

Yuval F 2009-08-18 06:31:49

ansaurus

tags:

views:

answers:

Searching and sorting by language

related questions