views:

59

answers:

1

I'm setting up a multi-language using gettext. Since all text from other languages are in the compiled .mo files. How should I attack the search function of the site? Any help or point of direction would be appreciated. Note, I have not coded a search enigine before…

+3  A: 

As I understand, you would like to provide search for information that is stored in .mo files; text files of key-value pairs.

The problem will be in mapping a k-v pair in a particular .mo to a particular URI. If you can do this, you can run a script that parses the .mo files, and stores the phrases along with related URI (or other resource identifier) in some kind of data store, such as Apache Solr or a MySQL detabase (with a FULLTEXT-indexed column).

Another option is to use a crawler to slurp and index by keyword and language all the pages in your site. Here's a list of open-source crawlers:

http://en.wikipedia.org/wiki/Web_crawler#Open-source_crawlers

All the best.

Adam
I see. It seems a crawler is a simpler choice. Thank you.
T1000
You're welcome! Yes, my feeling is that it would be simpler to use a crawler. You can embed a language code (and other useful information) in your page's <META> tags. Have a look at http://www.htdig.org/
Adam