views:

49

answers:

2

I am working on a website project. We have a MySql and a MongoDb base.

  • We want to add a full-text search-engine over these bases (and if it can be linked with PostgreSql it's better).

  • These databases contain multilingual texts but we cannot determine the language.

I saw Solr, ElasticSearch and Sphinx, but what is your advice on this topic ?

  • Solr and Sphinx have stemmings but I am not sure we can use it without knowledge about content language...

  • Elastic is full JSON that could be better if we use more and more mongoDb...

A: 

It doesn't matter what search engine you use, stemming is highly language-dependent. IMHO you'll have to somehow detect the language in order to feed the text to the proper stemmer.

Mauricio Scheffer
A: 

There is a product from Basis Technologies called the Rosette Language Platform that does autodetection of languages that you might look into.

Solr supports JSON for results (and indexing???) if that is a key integration mechanism. I would put "JSON" support a bit further down the list of things to scorecard on, and focus on How Relevant Will Results Be From Search Engine X For My Domain.

Eric Pugh