views:

229

answers:

3

I have match regular expresion in Perl. The match sentence that spreads over more than one line.

I realize that I must enter the match regular expresion in one line only, if I spread to multiple lines it fails:

$array_11 =~ m{By Steve (.*), MarketWatch LONDON (.*) -- Shares of Anglo American rallied on Monday morning as (.*) bet that the mining group will reject a (.*)};'

If I write it in multiple lines it won't able to match this string.

+8  A: 

You may be looking for the /x modifier.

From perldoc perlre:

x Extend your pattern's legibility by permitting whitespace and comments.

Alan Haggai Alavi
+13  A: 

As mentioned previously, it looks like you are looking for the x modifier. That modifier ignores all whitespaces in the regexp, and allow comments (starting with #).

In your case it's a bit ugly though, because you then have to replace all the spaces that you do want to match in the regexp by [ ], \s or \s+:

$array_11 =~ m{By \s+ Steve \s+ (.*), \s+
               MarketWatch \s+ LONDON \s+ (.*) \s+
               -- \s+ Shares \s+ of \s+ Anglo \s+ American \s+ 
               rallied \s+ on \s+ Monday \s+ morning \s+ as \s+ 
               (.*) \s+ bet \s+ that \s+ the \s+ mining \s+ 
               group \s+ will \w+ reject \w+ a \w+(.*)
              }x;

So in fact I would probably write something like this:

my $sentence= q{By Steve (.*), MarketWatch LONDON (.*) }
            . q{-- Shares of Anglo American rallied on Monday morning as (.*) }
            . q{bet that the mining group will reject a (.*)}
            ;
my $array_11=~ m{$sentence};

A last comment: $array_11 has a strong code smell, if it's an array, then make it an array, not several scalar variables.

mirod
Thank you very muchThis is what i meant
You're both getting a ridiculous amount of backtracing by using greedy .* Each time you use .* you swallow up all the characters left and then backtrace until you can complete the next part. The non-greedy .*? will at least watch for the next sequence. And I don't expect that you're expecting "Steve MarketWatch, MarketWatch LONDON", so the .*? makes it *explicit* that you want look out for the rest of the characters.
Axeman
premature optimization... OK, you're right, I wasn't really paying attention to the content of the regexp, maybe I should have! Replacing every single .* by .*? will be a lot more efficient. Thanks.
mirod
You should decide on .* vs .*? based on what you want to match in ambiguous cases, not on one being faster.
ysth
@ysth: I think I made that point as well--when I said I doubt he expects the sequence "Steve MarketWatch". It's not just about speed. It's a sanity check; it just contains a speed concern. Kind of like: "select * from terabyte_table" is a design issue as well as a optimization.
Axeman
+1  A: 

All the escaped spaces are really ugly and distracting. So, here is an alternative:

my ($pattern) = map { qr/$_/ } join q{ }, split q{ }, <<'EOP';
    Steve (.*), MarketWatch LONDON (.*) --
    Shares of Anglo American rallied on Monday morning
    as (.*) bet that the mining group will \w+ reject
    \w+ a \w+(.*)
EOP

$text =~ $pattern;

NB: I left the (.*) in because I did not know what the OP wants, but see Axeman's comment on mirod's answer.

Sinan Ünür