In my datasource there are a lot of special characters like forward slash, minus, plus etc. A lot of these characters bring problems to lucene.
That's why I decided to encode all the strings I put in the index.
For example apple/pear would become apple%2Fpear
I would imagine that searching for the very same string would then return me this doc.
But I return home empty handed. What's going wrong?
--EDIT--
After some fooling around I noticed that the queries I create in Luke with the StandardAnalyzer (with any analyzer for that matter) changes my %2 in a space. Hence no results. Can I somehow make the queryAnalyzer not convert these? Maybe I should use a different escaping method then %XX?
--More Info--
I'm using StandardAnalyzer for both indexing and querying.
I'm not encoding spaces. This is one of the reasons why I've quickly rolled my own encoding instead of using the default URL encoder.
Making apple/pear into apple pear would make sence, but in my real data it doesn't always (using the fruit thing to protect the inocent) and building in intelligence on when to insert spaces and when not would hold too many risks.
Using Luke I can see my field holds appel%2Fpear. Searching for fruitName:appel works. Searching for fruitName:appel%2Fpear doesn't and neither does fruitName:appel%2fpear.