tags:

views:

33

answers:

2

I would like to know if the result of RegEx.IsMatch is affected by encoding.

I am checking if a string is contained within another one by using a regular expression pattern.

I am 99.9 % sure the pattern is correct, so my question is...

The matching test with the Regex.IsMatch, is applied on "byte level" or "string level"?

UPDATE:

This one is the Output... TEΣT

This one is the Word to match... ΤΕΣΤ

and here is the pattern...

If Regex.IsMatch(Output, "(?<=^|\b|\s)" & Regex.Escape(Word) & "(?=\s|\b|$)") Then
'dooooo
end if
+2  A: 

All the Regex functions in .NET work on strings not on byte encodings.

If you are having problems it might be because your string was decoded incorrectly so that some of the characters in the string are not the correct characters. If you can post your string and the regular expression we might be able to explain why it doesn't match.

Mark Byers
I guess then, i am 99.9% wrong :)
Chocol8
I updated the question as requested
Chocol8
+1  A: 

Regular expressions are culture sensetive, so it uses the current culture for example when to decide how to handle case-sensetivity.

There is an option RegexOptions.CultureInvariant that you can use to turn this off. This causes it to use a neutral culture instead. The regular expression still works on a character level though, a chracter is a 16-bit code point, it's not a byte.

Guffa