I'm working on a routine to strip block or line comments from some C# code. I have looked at the other examples on the site, but haven't found the exact answer that I'm looking for.
I can match block comments (/* comment */) in their entirety using this regular expression with RegexOptions.Singleline:
(/\*[\w\W]*\*/)
And I can match line comments (// comment) in their entirety using this regular expression with RegexOptions.Multiline:
(//((?!\*/).)*)(?!\*/)[^\r\n]
Note: I'm using [^\r\n]
instead of $
because $
is including \r
in the match, too.
However, this doesn't quite work the way I want it to.
Here is my test code that I'm matching against:
// remove whole line comments
bool broken = false; // remove partial line comments
if (broken == true)
{
return "BROKEN";
}
/* remove block comments
else
{
return "FIXED";
} // do not remove nested comments */ bool working = !broken;
return "NO COMMENT";
The block expression matches
/* remove block comments
else
{
return "FIXED";
} // do not remove nested comments */
which is fine and good, but the line expression matches
// remove whole line comments
// remove partial line comments
and
// do not remove nested comments
Also, if I do not have the */ positive lookahead in the line expression twice, it matches
// do not remove nested comments *
which I really don't want.
What I want is an expression that will match characters, starting with //
, to the end of line, but does not contain */
between the //
and end of line.
Also, just to satisfy my curiosity, can anyone explain why I need the lookahead twice? (//((?!\*/).)*)[^\r\n]
and (//(.)*)(?!\*/)[^\r\n]
will both include the *, but (//((?!\*/).)*)(?!\*/)[^\r\n]
and (//((?!\*/).)*(?!\*/))[^\r\n]
won't.