tags:

views:

382

answers:

2

In my datasource there are a lot of special characters like forward slash, minus, plus etc. A lot of these characters bring problems to lucene.
That's why I decided to encode all the strings I put in the index.

For example apple/pear would become apple%2Fpear
I would imagine that searching for the very same string would then return me this doc.

But I return home empty handed. What's going wrong?

--EDIT--
After some fooling around I noticed that the queries I create in Luke with the StandardAnalyzer (with any analyzer for that matter) changes my %2 in a space. Hence no results. Can I somehow make the queryAnalyzer not convert these? Maybe I should use a different escaping method then %XX?

--More Info--
I'm using StandardAnalyzer for both indexing and querying.
I'm not encoding spaces. This is one of the reasons why I've quickly rolled my own encoding instead of using the default URL encoder. Making apple/pear into apple pear would make sence, but in my real data it doesn't always (using the fruit thing to protect the inocent) and building in intelligence on when to insert spaces and when not would hold too many risks. Using Luke I can see my field holds appel%2Fpear. Searching for fruitName:appel works. Searching for fruitName:appel%2Fpear doesn't and neither does fruitName:appel%2fpear.

A: 

What type of analyzer are you using?

flalar
I'm using standardanalyzer. see OP for more info
borisCallens
+1  A: 

While querying you need to escape special characters. From the list of special characters the forward slash is missing. I suppose, you don't need handle that character.

But you need to verify that the tokens created while indexing also have those special characters. eg There should be one token for "apple/pear".

Shashikant Kore
I have noticed that the default Escape() method of Lucene.net is not very complete. Forward slash is definitely a problem but isn't being escaped for example.
borisCallens