tags:

views:

49

answers:

2

I want to match multiline comments that contain a specific word, let's say findthis. The first pattern that comes to mind is \/\*.*?findthis.*?\*\/ (using DOTALL). The problem with this pattern however is that a string like this:

/* this is a comment */
this is some text
/* this is a findthis comment */

will match the whole text. Basically, on a bigger file, the first match would contain everything from the first comment to the first comment containing findthis. How can I prevent this?

+2  A: 

Well, you could change the regex to something like \/\*([^*]|\*+[^/*])*findthis([^*]|\*+[^/*])*\*+\/ but...

To get this exactly right, you would have to fully tokenize the source code. Otherwise your regex will be fooled by comment-like content inside strings (among other bizarre corner cases).

(Explanation of crazy regex: ([^*]|\*+[^/*]) matches a little bit of the inside of a comment, but never matches all or part of */.)

Jason Orendorff
Very imaginative. Thanks!
Felix
A: 

I think this should do the trick:

/\/\*.*?findthis.*?\*\//. The ? in the .*? part means ungreedy. In this way the comment can contain * and / chars, but not */ (the end of the comment)

VDVLeon
That's exactly the same pattern that I posted (with two additional slashes at the beginning and end - because you are probably a PHP user). Have you tried this pattern on the example I provided? It will not work.
Felix
Sorry, i didn't look right. Strange that is doesn't work.
VDVLeon