tags:

views:

135

answers:

3

I have Solr with indexed database. In my database all data is in Latvian. The problem is, I need to be able to search word Riga as if it is word Rīga. Of course, i can define synonym - Rīga = Riga, but can i just define, that letter ī is letter i? I read something about solr.ISOLatin1AccentFilterFactory, but as far as i understood, this is not for UTF-8 encoding, right? Advices?

+2  A: 

Used PatternReplaceFilterFactory with index and query. Seems to be working right.

Yurish
A: 

Glad to hear it worked! I wondered about this, but didn't have a real answer!

Eric Pugh
Your reply should have been a comment, not an answer. Some users would down vote for this...
harschware
+1  A: 

ISOLatin1AccentFilterFactory is exactly what you are looking for... as long as the accent EXISTS in the latin-1 character set (lower 7 bits of UTF-8 are identical to latin-1). The ī that you mentioned doesn't appear to exist in ISO-8859-1 so ISOLatin1AccentFilterFactory won't work in this SPECIFIC case. I would still recommend that you use ISOLatin1AccentFilterFactory in addition to any exceptions that you take care of using PatternReplaceFilterFactory as there probably are some Latvian characters that it will help (assuming, I don't have experience with Latvian)

FYI, I did actually try the against my Solr setup with ISOLatin1AccentFilterFactory and it didn't help this case.

Trey