views:

93

answers:

1

I use jquery.autocomplete, which uses a javascript regexp to highlight substrings in the list of suggestions that match the autocomplete key string. So if the use types "Beat" and one of the autocomplete suggestions the server returns is "The Beatles" then plugin displays that suggestion as "The Beatles".

I'm trying to think of ways to make this work with string matching that isn't sensitive to accents, diacriticals and the rest. So if the user typed "Huske" and the server suggested "Hüsker Dü" then this would be displayed as "Hüsker Dü".

The principle is the same as string comparison with specified collations such as in MySql or ICU, or with Oracle's sorts. In SphinxSearch a charset_table works for this. A collation such as utf8_general_ci would be ideal for my purposes.

+1  A: 

The only thing I can think of is pretty brute-force. If any character in the input string is known to have one or more accented forms, replace it with a character class containing all of the forms when you create the regex. For example, for the input string Huske, the regex might be /H[uùúûü]sk[eèéêë]/.

Alan Moore
Hi Alan, yes, I can imagine how that would work. The function that transforms the search key into an RE with charsets (like in your example) would understand the collation. I guess it makes sense when testing many strings against few keys. An alternative would be to normalize both keys and subject strings using similar transforms – that would yield faster REs but require a lot of transformations.
fsb