tags:

views:

55

answers:

2

Hi!
I am indexing some files written in spanish in Solr, and sometimes appears chars like ¿D é ....
I wonder if there is some TokenFilter to avoid this chars when the text has accent (á, é, í, ó...) or letter ñ.

Thanks

A: 

I added this to my schema.xml

charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>

which sould be the solution, but the char are still there. Any other idea?? tx

Blanca
where did you add the charFilter? did you rebuild your index after adding this charFilter?
Mauricio Scheffer
I added it where every other filters are: <fieldType name="textTight" class="solr.TextField" positionIncrementGap="100" > <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> .... <!-- Filtor para quitar acentos y ñññ--> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/> .... </analyzer> </fieldType> Of course I rebuild my index after that.
Blanca
A: 

I added it where every other filters are:

fieldType name="textTight" class="solr.TextField"
positionIncrementGap="100" >
analyzer>
tokenizer class="solr.WhitespaceTokenizerFactory"/>

    filter class="solr.SynonymFilterFactory"    

synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
.... !-- Filtro para quitar acentos y ñññ-->
charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/> ....
/analyzer>
/fieldType>

Of course I rebuild my index after that.

(I add this answer, because in the comment it wasn't clear enaugh)

Blanca