views:

198

answers:

1

Hi,

I have two Xapian databases, let's call one "EN" and the other "DE", and let's say the former contains some documents in English, and the latter in German.

If I want users to be able to search both at once, I can easily load both of the databases. However, it seems like I can only use one stemmer and set of stop words?

There's no way to instantiate an English-language stemmer and have it apply just to those results that come from the "EN" database? There's no way to create a Stopper with english words, and have it apply just to those results that come from the "EN" database?

Can this be right?

+1  A: 

Hi Sean,

Stemming is only useful if you know the language of the text you're stemming. If you've created your Xapian databases with stemming (i.e., the Xapian databases are storing stemmed forms of the original words) then you would have specified a language.

However at search time, you also need to know the language to stem correctly. If your users enter a query in English, you must stem in English before applying the query to the English database. The same applies for German. If you want to search each database perhaps you should create two separate, language-specific queries from each user request.

However bear in mind that a query originally entered in German, but then stemmed with an English stemmer, may produce some odd results - if you have any way of finding out what language your users are using at query time then this can be used to apply the correct stemmers.

HTH - by the way, the Xapian-discuss mailing list (see www.xapian.org) is a good place to ask this kind of question.

Charlie