views:

227

answers:

3

I am working on a preprocessor that is analyzing a DSL. My goal is to remove the comments. The block comment facility is demarcated by %% before and after. I do not have to worry about %% being in strings, by the definition of the language.

I am using this s/// regex. Unfortunately, it seems to match everything and wipe it out:

#Remove multiline comments.
$text_string =~ s/%%.*%%//msg;

What am I doing wrong?

+9  A: 

the first thing you can do is make it non-greedy:

.*?

otherwise,

%% some text %%

real content

%% other text %%

will all be wiped out.

動靜能量
+1  A: 

From perlfaq6: What does it mean that regexes are greedy? How can I get around it?


Most people mean that greedy regexes match as much as they can. Technically speaking, it's actually the quantifiers (?, *, +, {}) that are greedy rather than the whole pattern; Perl prefers local greed and immediate gratification to overall greed. To get non-greedy versions of the same quantifiers, use (??, *?, +?, {}?).

An example:

$s1 = $s2 = "I am very very cold";
$s1 =~ s/ve.*y //;      # I am cold
$s2 =~ s/ve.*?y //;     # I am very cold

Notice how the second substitution stopped matching as soon as it encountered "y ". The *? quantifier effectively tells the regular expression engine to find a match as quickly as possible and pass control on to whatever is next in line, like you would if you were playing hot potato.

brian d foy
A: 

assuming that you have read entire code into the variable $str and between %% and %% there is no possibility of a single % occuring, you could use this.

$str =~ s/%%([^%]+)%%//g;