views:

269

answers:

2

I am trying to find the index of a substring in a string that matches another string under a specific culture (provided from a System.CultureInfo).

For example the string "ass" matches the substring "aß" in "straße" under a German culture.

I can find the index of the start of the match using

culture.CompareInfo.IndexOf(value, substring);

but without resorting to brute force, is there an easy way of identifying that 2 characters were matched, and not 3?

A: 

Does regular expressions handle that distinction of ss vs. ß?

Lasse V. Karlsen
+2  A: 

If you use a capture group, you can capture the exact match that was found, and from that you can determine how many characters were matched.

I'm a bit timestressed right now to give an example, so I hope you can figure it out from my description.

Perhaps I'll ammend my answer later.

Dave

Dave Van den Eynde
I was not aware that a regular expression could be run under a particular culture - how is this done?
Oliver Hallam
The documentation states that case-insensitive operations are culture-sensitive by default. The Thread.CurrentCulture is used at this point. But apparently (under .NET 2.0) it doesn't match "ß" with "ss", even though they are the same under that culture.So my anser doesn't help you.
Dave Van den Eynde
I know I have experienced issues with MS's regex before. Its case-insensitive matching fails to match a lower case k with a kelvin sign for example (despite them both having the same upper case), and fails to deal with multibyte characters (which is another requirement here).
Oliver Hallam