"Lenient" regex matching of similar characters in C#/.Net

views:

answers:

"Lenient" regex matching of similar characters in C#/.Net

Is there a way to get .Net to positively match strings, even if some characters are not exactly the same? Examples of characters that should be considered to be similar could be: 'a'/'á' and 'í'/'i'. The Chrome browser find-as-you-type recognizes these characters as being equivalent.

Sure its possible if you write out the algorithm yourself. The only thing close to doing what you speak with the OOB Regex.Match() overloads is in the RegexOptions, the CultureInvariant. But, unless you are flipping culture's that's not going to be of any use.

P.Brian.Mackey 2010-07-02 15:57:06

Maybe you want to look into Soundex/Metaphone functions, to first normalise strings, and then perform your regex operations on the results of that?

Peter Boughton 2010-07-02 16:04:51

+2 A:

Take a look at this blog post by Michael Kaplan. The code here uses standard .NET class library methods for

Normalising Unicode strings, in this case, using a "composite" normalisation form which ensures that a character like á is represented by separate code points for a and its diacritic(s);
Identifying the diacritics using classes that expose databases of information about Unicode characters, and stripping them out.

shambulator 2010-07-02 16:11:28

ansaurus

tags:

views:

answers:

"Lenient" regex matching of similar characters in C#/.Net

related questions