tags:

views:

2042

answers:

5

I am a novice with Regex usage in C#. I want a regex to find the next keyword from a given list but which is not surrounded by the quotes.

e.g. if i have a code which looks like:

            while (t < 10)
            {
                string s = "get if stmt";
                u = GetVal(t, s);
                for(;u<8;u++)
                {
                    t++;
                }

            }

i tried using the Regex as @"(.*?)\s(FOR|WHILE|IF)\s" but it gives me the "if" as next keyword. whereas, i want to get the next keyword after while as "for" and not as "if" which is surrounded by quotes.

Can it be done in anyway using Regex? Or i will have to use conventional programming?

A: 

You can try backreferencing, which would let you match the string, but since you want to do the exact opposite you'd be better of escaping the string instead, that's actually really easy.

Either write a regex that matches strings and replaces them with nothing, or run through the text skipping quoted strings and looking for keywords in the mean time. I recon the latter will be more efficient.

John Leidegren
Thanks for the response. Well yes, in latter case you suggested i will have to search for the quotes string or a keyword whichever comes first. But, i thought using Regex would actually reduce the code length.So,wanted to find it out.
Archie
A: 

I suppose Regex, can not readily understand C# keywords. I would suggest you to use : Microsoft.CSharp.CSharpCodeProvider, using this Visual studio manages C# code.

nils_gate
I do not intend to use the code only for C#, but may also use it for other languages. Also, i dont want to find all the keywords but search for only few specific keywords.
Archie
+2  A: 

Try the following RegEx (Edit: fixed).

(?:[^\"]|(?:(?:.*?\"){2})*?)(?: |^)(?<kw>for|while|if)[ (]

Note: Because this RegEx literal includes quotes, you can't use the @ sign before the string. Remember that if you add any RegEx special chars to the string, you'll need to double-escape them appropiatlye (e.g. \w). Insure that you also specify the Multiline parameter when matching with the RegEx, so the caret (^) is treated as the start of a new line.

This hasn't been tested, but should do the job. Let me know if there's any problems. Also, depending on what more you want to do here, I might recommend using standard text-parsing (non-RegEx), as it will quickly become more readable depending on how much data you want to extract from the code. Hope that helps anyway.

Edit: Here's some example code, which I've tested and am pretty confident that it works as intended.

var input = "while t < 10 loop\n s => 'this is if stmt'; for u in 8..12 loop \n}"; 
var pattern = "(?:[^\"]|(?:(?:.*?\"){2})*?)(?: |^)(?<kw>for|while|if)[ (]";
var matches = Regex.Matches(input, pattern);
var firstKeyword = matches[0].Groups["kw"].Value;
// The following line is a one-line solution for .NET 3.5/C# 3.0 to get an array of all found keywords.
var keywords = matches.Cast<Match>().Select(match => match.Groups["kw"].Value).ToArray();

Hopefully this should be your complete solution now...

Noldorin
You can use a verbatum string (@), but you'll need to use "" instead of \"
Richard Szalay
@Richard: Good point. I guess it's just a matter of personal preference. Either way it's important to be aware that \" is *not* a RegEx escape sequence in this case, and likewise "" is only a *single* double-quote.
Noldorin
Thanks its working fine. But what changes shall i do to the regex to get the first keyword i.e. while in this case?
Archie
I presume you're calling the Matches method of the RegEx option and just want to extract the keyword text. If 'matches' is the MatchCollection returned, then matches[0].Groups[0].Value should give you the first keyword, matches[1].Groups[0].Value the second keyword, and so on.
Noldorin
i tried following code: string ip ="while t < 10 loop\n s => 'this is if stmt'; for u in 8..12 loop \n}"; string pattern = @"(?:[^']|(?:(?:.*?'){2})*?)[ ^](for|while|if)[ ]"; but it gives 3 matches as while(with space before it),if and for. whereas, it should have been only while and for.
Archie
Sorry, seems like there were indeed a few minor problems with the RegEx. Post has been updated with a tested working version.
Noldorin
thanks a lot, but i'm using version 2.0 and if i try to use it on string i/p" t < 10 loop\n s => 'this is if stmt'; for u in 8..12 loop \n}";or" ' while ' t < 10 loop\n s => 'this is if stmt'; for u in 8..12 loop \n}";then it gives me if(1st i/p) and while(2nd i/p) as keywords.
Archie
Yeah, so the problem here is that it's not recognising single-quotes. If it were purely .NET code, then you could never enclose a keyword name in single-quotes, so there is no problem.
Noldorin
(contd.) You need to change the pattern to this to accept single quotes too: (?:[^\"']|(?:.*?(?<q>[\"']).*?\k<q>)*?)(?: |^)(?<kw>for|while|if)[ (]
Noldorin
+1  A: 

If you decide to go the Regex route you can use this site to test your regular expression

Draco
Check this site also for Regex testinghttp://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashxi found it more user friendly.
Archie
A: 

Can it be done in anyway using Regex?

In the general case, no. The syntax of C# is not amenable to regex parsing.

Consider these corner cases:

method("xxx\"); while (\"xxx");

method(@"xxx \"); while (...);

// while

/* while */

/* xxx
// xxx */ while

/* xxx " xxx */ while ("...

Languages as complex as C# need dedicated parsers.

bobince
well, as i said in one of the comments on this post that i wont be using it only for C#. But i want a regex in which i just want to change the list of keywords and get the next keyword in the input string.
Archie