tags:

views:

62

answers:

2

Hi folks,

in my C# program, I have a regular expression textparser, that finds all occurrences of words that are surrounded by double squared brackets. For instance, [[anything]] would find the word anything.

In a second step, I want to count how often the found word (in my example: anything) appears in the whole text. To do this, I try to create a RE that contains the found word and count, how many matches I get. Problem is, that the found word can also contain special chars and the following regex:

string foundWord = "(anything";
Regex countOccurences = new Regex(foundWord);

will obviously fail when the variable contains special chars like '('. Expresso suggests for matching whole expressions the following construct:

Regex countOccurences = new Regex("(?(" + foundWord + ")Yes|No)");

but when in this scenario foundWord is a number, like '2009', the RE tries to interpret it as a reference to a group (which is obviously not defined). In my text, there can be any combination of normal chars, special chars, numbers etc.

How can I tell the RE to interpret the given string as literal expression only?

Thanks in advance, Frank

+5  A: 

You should escape the literal before building the regular expression with it, using Regex.Escape

Something like:

Regex countOccurances = new Regex(Regex.Escape(foundWord));

However, since all you're doing is counting occurances, a better option is to avoid using a regular expression for the second search at all. Since you don't care about any special characters, it would be easier just to do a plain text search.

Eddie Sullivan
Oh well, sometimes it is soo easy. Thanks for pointing this out to me!
Aaginor
+1  A: 

if you're just trying to count the number of occurences of a string, why use a regex at all? Just use your basic string libraries, contains(), indexOf(), whatever makes most sense in C#. But if you don't need the fancy functionality of a regex, why use a regex? I think

int position = string.indexOf(foundString);
while(position != -1)
{
    count++;
    position = string.indexOf(foundString, position + 1);
}

would accomplish it without regexes.

Brian Schroth
Thanks for that tipp! It works well, when changed to position = string.indexOf(foundString, position + 1); - otherwise, you will get an endless loop.
Aaginor
good catch, edited to fix
Brian Schroth