I need to take a paragraph of text and extract from it a list of "tags". Most of this is quite straight forward. However I need some help now stemming the resulting word list to avoid duplicates. Example: Community / Communities
I've used an implementation of Porter Stemmer algorithm (I'm writing in PHP by the way):
http://tartarus.or...
I am trying to get SpellChecker setup using Lucene.NET, it all works fine other than situations similar to the following:
I have text containing satellite in the index, I analyze it using Snowball.
I then create a SpellChecker index and get suggestions from it. The suggestion I get returned when passing in "Satalite" is "satellit".
I...
I am using Weka with the porter Stemmer provided in the SnowBall package. Everything works fine if I run my application within Eclipse, but as soon as I export it as runnable jar (With all the libraries included) weka says:
Stemmer 'porter' unknown!
How could I fix that?
...
Just getting started with Lucene.Net. I indexed 100,000 rows using standard analyzer, ran some test queries, and noticed plural queries don't return results if the original term was singular. I understand snowball analyzer adds stemming support, which sounds nice. However, I'm wondering if there are any drawbacks to gong with snowball...