custom Analyzer using ASCIIFoldingFilter not replacing diacritics

Hello experts,

We have an issue with a custom Lucene.NET Analyzer which uses ASCIIFoldingFilter and LowerCaseFilter.

While indexing our content, the lower case filter works and makes all terms low case but the ASCIIFoldingFilter leaves the diacritics untouched (there are no errors but characters like őŏő are not replaced with o, they are untouched and appear like this in the index - I would have expected this to work or fail not do nothing).

The relevant code is like this:

public TokenStream TokenStream(String fieldName, TextReader reader) {
  Tokenizer tokenizer = new StandardTokenizer(reader);
  TokenStream stream = new StandardFilter(tokenizer);
  stream = new ASCIIFoldingFilter(stream);
  return new LowerCaseFilter(stream);
}

Are there some additional steps that need to be performed to use the ASCIIFoldingFilter?

Is there some working Java example that I could adapt to Lucene.NET?

Thank you!

EDIT: I managed to fix this. It was a misconfiguration issue. The custom analyzer was not used, another one was used which just did low case. The custom filter is now working correctly. Sorry!

ansaurus

tags:

views:

answers:

custom Analyzer using ASCIIFoldingFilter not replacing diacritics

related questions