tags:

views:

667

answers:

4

Hello,

I need help indexing and searching english text using Java Lucene over Google App Engine. The only solution I have found so far was the SnowballAnalyzer (in the contrib packages), but it only supports Lucene 3.0, and GAELucene only supports lucene 2.3.1. Just changing jars doesn't really work..

Can anyone help me index my text with an English stemmer?

Thanks!

+1  A: 

The SnowballAnalyzer has been with Lucene for a long time now, including 2.x versions (see its entry in the 2.4.1 API docs).

Bizarrely, though, it doesn't come as part of the standard Lucene distribution, even if it is in the documentation. You'll have to hunt down a version of the contrib package that is to be used for 2.3.1.

Edit: Looks like there's a copy here.

skaffman
A: 

Various companies also sell more sophisticated and/or speedier alternatives to Porter Stemmers implemented in a Snowball interpreter. If you have needs in that direction, post a comment and I'll elaborate, but I don't want to get accused of unjustified advertising, so I'll leave it there for now.

bmargulies
+1  A: 

The PorterStemFilter is in the lucene core. It can be used with the StandardAnalyzer for english stemming.

Coady
A: 

You can use lucene-2.3.1.zip or its neighboring files in the Lucene archive. I am unsure, however, about the degree of customization available from GAELucene. It does not appear to be open to accept arbitrary analyzers.

Yuval F