Ive been trying to match PHP comments using regex.
//([^<]+)\r\n
Thats what ive got but it doesn't really work.
Ive also tried
//([^<]+)\r
//([^<]+)\n
//([^<]+)
...to no avail
Ive been trying to match PHP comments using regex.
//([^<]+)\r\n
Thats what ive got but it doesn't really work.
Ive also tried
//([^<]+)\r
//([^<]+)\n
//([^<]+)
...to no avail
In what program are you coding this regex? Your final example is a good sanity check if you're worried that the newline chars aren't working. (I have no idea why you don't allow less-than in your comment; I'm assuming that's specific to your application.)
Try
//[^<]+
and see if that works. As Draemon says, you might have to escape the diagonals. You might also have to escape the parentheses. I can't tell if you know this, but parentheses are often used to enclose capturing groups. Finally, check whether there is indeed at least one character after the double slashes.
To match comments, you have to think there are two types of comments in PHP 5 :
//
and go to the end of the line/*
and go to */
Considering you have these two lines first :
$filePath = '/home/squale/developpement/astralblog/website/library/HTMLPurifier.php';
$str = file_get_contents($filePath);
You could match the first ones with :
$matches_slashslash = array();
if (preg_match_all('#//(.*)$#m', $str, $matches_slashslash)) {
var_dump($matches_slashslash[1]);
}
And the second ones with :
$matches_slashstar = array();
if (preg_match_all('#/\*(.*?)\*/#sm', $str, $matches_slashstar)) {
var_dump($matches_slashstar[1]);
}
But you will probably get into troubles with '//
' in the middle of string (what about heredoc syntax, btw, did you think about that one ? )
, or "toggle comments" like this :
/*
echo 'a';
/*/
echo 'b';
//*/
(Just add a slash at the begining to "toggle" the two blocks, if you don't know the trick)
So... Quite hard to detect comments with only regex...
Another way would be to use the PHP Tokenizer, which, obviously, knows how to parse PHP code and comments.
For references, see :
With that, you would have to use the tokenizer on your string of PHP code, iterate on all the tokens you get as a result, and detect which ones are comments.
Something like this would probably do :
$tokens = token_get_all($str);
foreach ($tokens as $token) {
if ($token[0] == T_COMMENT
|| $token[0] == T_DOC_COMMENT) {
// This is a comment ;-)
var_dump($token);
}
}
And, as output, you'll get a list of stuff like this :
array
0 => int 366
1 => string '/** Version of HTML Purifier */' (length=31)
2 => int 57
or this :
array
0 => int 365
1 => string '// :TODO: make the config merge in, instead of replace
' (length=55)
2 => int 117
(You "just" might to strip the //
and /* */
, but that's up to you ; at least, you have extracted the comments ^^ )
If you really want to detect comments without any kind of strange error due to "strange" syntax, I suppose this would be the way to go ;-)