I'm using Porter Stemmer to stem the words, and here's a problem I'm running into:
Word "mortgage" is correctly stemmed to "mortgag" Word "mortgagee" is (arguably incorrectly) stemmed to "mortgage"
There are approximately 100 documents with the word "mortgage" There is 1 document with word "mortgagee"
When I build an index without putting "mortgagee" in any documents, everything works fine: searching for "mortgage" or "mortgages" or "mortgag" returns all 100 documents.
When I build an index and one of the documents contains "mortgagee", searching the index for "mortgage" only returns a single document with "mortgagee" (which was stemmed down to "mortgage"). However, searching for "mortgag" or "mortgages" returns all 100 documents.
The only logical conclusion I can make from this problem is lucene first searches for the pre-stemmed word, and if it doesn't find any results, it continues to search for the stemmed word. Thus, when searching for 'mortgage', it first finds the 'mortgage' that was stemmed from 'mortgagee' and stops searching. Is this the correct behavior, or is it a bug?