snowball

Stemming algorithm that produces real words

I need to take a paragraph of text and extract from it a list of "tags". Most of this is quite straight forward. However I need some help now stemming the resulting word list to avoid duplicates. Example: Community / Communities I've used an implementation of Porter Stemmer algorithm (I'm writing in PHP by the way): http://tartarus.or...

Lucene using Snowball and SpellChecker brings back strange values

I am trying to get SpellChecker setup using Lucene.NET, it all works fine other than situations similar to the following: I have text containing satellite in the index, I analyze it using Snowball. I then create a SpellChecker index and get suggestions from it. The suggestion I get returned when passing in "Satalite" is "satellit". I...

Porter Stemmer and Weka

I am using Weka with the porter Stemmer provided in the SnowBall package. Everything works fine if I run my application within Eclipse, but as soon as I export it as runnable jar (With all the libraries included) weka says: Stemmer 'porter' unknown! How could I fix that? ...

Lucene Standard Analyzer vs Snowball

Just getting started with Lucene.Net. I indexed 100,000 rows using standard analyzer, ran some test queries, and noticed plural queries don't return results if the original term was singular. I understand snowball analyzer adds stemming support, which sounds nice. However, I'm wondering if there are any drawbacks to gong with snowball...