views:

123

answers:

2

Hi,

I'm running a search application on a FAST ESP server. Now I have this problem with character normalization.

What I want is to search for 'wurth' and get a hit in 'würth'.

i've tried configuring the following in esp/etc/tokenizer/tokenization.xml

 <normalizationlist name="German to Norwegian">
   <normalization description="German u with diaeresis, to Norwegian u">
      <input>x75</input> 
      <output>xFC</output> 
      <output>x75</output>
   </normalization>
  </normalizationlist>

but of cours, this translate all u to ü, which is useless.

How do I configure this the right way?

A: 

The solution is to normalize every "special character" to the same "normal character";

ö -> o ø -> o å -> a ä -> a æ -> a

This is at bit time consuming, but it works!

jorgen
A: 

Read the Avanced Logistics Guide. It contains a chapter on Character Normalization. When you follow the steps from the guide all special characters will be treated as normal characters. So searching for über will give the same results as when searching for uber.

Edward Smit