I'm setting up a multi-language using gettext. Since all text from other languages are in the compiled .mo files. How should I attack the search function of the site? Any help or point of direction would be appreciated. Note, I have not coded a search enigine before…
+3
A:
As I understand, you would like to provide search for information that is stored in .mo files; text files of key-value pairs.
The problem will be in mapping a k-v pair in a particular .mo to a particular URI. If you can do this, you can run a script that parses the .mo files, and stores the phrases along with related URI (or other resource identifier) in some kind of data store, such as Apache Solr or a MySQL detabase (with a FULLTEXT-indexed column).
Another option is to use a crawler to slurp and index by keyword and language all the pages in your site. Here's a list of open-source crawlers:
http://en.wikipedia.org/wiki/Web_crawler#Open-source_crawlers
All the best.
Adam
2010-07-29 19:16:00
I see. It seems a crawler is a simpler choice. Thank you.
T1000
2010-08-04 09:13:18
You're welcome! Yes, my feeling is that it would be simpler to use a crawler. You can embed a language code (and other useful information) in your page's <META> tags. Have a look at http://www.htdig.org/
Adam
2010-08-04 09:28:19