tags:

views:

138

answers:

1

I'm trying to do remove javascript comments via regex in C# and have become stuck. I want to remove any occurrences of double slash // style comments.

My current regex is (?<!:)//[^\r\n]* which will catch all comments and prevent matching of http://. However, the negative lookbehind was lazy and of course bit me back in the following test case:

var XSLPath = "//" + Node;

So I'm looking for a regular expression that will perform a lookbehind to see if an even number of double quotes (") occurs before the match. I'm not sure if this is possible. Or maybe there's a better way to do this?

+2  A: 

(Updated based on comments)

It looks like this works pretty well:

(?<=".*".*)//.*$|(?<!".*)//.*$

It appears that the test cases in Regex Hero show that it'll match comments the way I think it should (almost).

For instance, it'll completely ignore this line:

var XSLPath = "//" + Node;

But it's smart enough to match the comment at the end of this line:

var XSLPath = "//"; // stuff to remove

However, it's not smart enough to know how to deal with 3 or more quotation marks before the comment. I'm not entirely sure how to solve that problem without hard-coding it. You need some way to allow an even number of quotes.

Steve Wortham
fails on `var s=``http://``www``.website.com;`
Gavin Miller
However, that's easily changed via `(?<![":])//.*(?!")$`
Gavin Miller
Fails on something like this:"text // more text"
Joel
@LFSR Consulting -- Nicely done, your modification seems to solve that problem. Hope you don't mind but I updated my answer with your fix.
Steve Wortham
Joel, it seems you'd need some way to allow an even number of quotes before the slashes, but not an odd number. Hmm...
Steve Wortham
I updated my regex again. This will match comments even if two quotes appear before it, but not 1 quote.
Steve Wortham
The regex looks good - thanks for all the effort, it's much appreciated!
Gavin Miller
You're welcome. It's still not perfect, but I'm glad it works for you.
Steve Wortham