views:

37

answers:

2

Given the following code:

Dim stemmer As New Lucene.Net.Analysis.PorterStemmer()
Response.Write(stemmer.Stem("mattress table") & "<br />") // Outputs: mattress t
Response.Write(stemmer.Stem("mattress") & "<br />") // Outputs:  mattress
Response.Write(stemmer.Stem("table") & "<br />") // Outputs: tabl

Could someone explain why the PorterStemmer produces different results when there is a space in the word? I was expecting 'mattress table' to be stemmed to 'mattress tabl'.

Also, this is further confusing by the following code:

Dim parser As Lucene.Net.QueryParsers.QueryParser = New Lucene.Net.QueryParsers.QueryParser("MyField", New PorterStemmerAnalyzer)
Dim q As Lucene.Net.Search.Query = parser.Parse("mattress table")
Response.Write(q.ToString & "<br />") // Outputs:  MyField:mattress MyField: tabl

q = parser.Parse("""mattress table""")
Response.Write(q.ToString & "<br />") // Outputs My Field:"mattress tabl"

Could someone explain why I am getting different results from the QueryParser() and the Stem() function for the same word(s) using the same Analyzer?

Thanks, Kyle

+1  A: 

PorterStemmerAnalyzer is composed of series of tokenizers and filters. PorterStemmer is one of the filters to the tokenstream generated. If you want to verify that, try changing the case of the query. QueryParser output will be in the lowercase due to LowerCaseFilter on tokenstream.

Some sample code for custom analyzer can be checked here. This will give you a peek inside an Analyzer.

Shashikant Kore
+2  A: 

The query parser tokenizes it first into two tokens. Porter considers it all as one "word" and so only stems the last portion.

Xodarap