tags:

views:

140

answers:

2

G'day,

I am using the following Perl fragment to extract output from a Solaris cluster command.

open(CL,"$clrg status |");
my @clrg= grep /^[[:lower:][:space:]]+/,<CL>;
close(CL);

I get the following when I print the content of the elements of the array @clrg BTW "=>" and "<=" line delimiters are inserted by my print statement:

=><=
=>nas-rg             mcs0.cwwtf.bbc.co.uk   No          Online<=
=>                   mcs1.cwwtf.bbc.co.uk   No          Offline<=
=><=
=>apache-rg          mcs0.cwwtf.bbc.co.uk   No          Online<=
=>                   mcs1.cwwtf.bbc.co.uk   No          Offline<=
=><=

When I replace it with the following Perl fragment the blank lines are not matched.

open(CL,"$clrg status |");
my @clrg= grep /^[[:lower:][:space:]]{3,}/,<CL>;
close(CL);

And I get the following:

=>nas-rg             mcs0.cwwtf.bbc.co.uk   No          Online<=
=>                   mcs1.cwwtf.bbc.co.uk   No          Offline<=
=>apache-rg          mcs0.cwwtf.bbc.co.uk   No          Online<=
=>                   mcs1.cwwtf.bbc.co.uk   No          Offline<=

Simple question is why?

BTW Using {1,} in the second Perl fragment also matches blank lines!

Any suggestions gratefully received!

cheers,

+9  A: 

That'll be because [:space:] matches newlines and carriage returns as well.

So [[:space:]]+ would match \n, \r\n, or \n\n.

But [[:space:]]{3,} would require three characters, and an empty line is just a \n.

{1,} and + mean the same thing: match the preceding group one or more times.

P.S. A typical newline is \n on Unix and \r\n on Windows.

Andomar
@Andomar, those POSIX character classes can't stand alone; they have to be used inside another pair of square brackets, ie, [[:space:]].
Alan Moore
Thanks, corrected in the answer
Andomar
+1  A: 

Hm. According to the Perl regular expression documentation, the [:space:] character class should not include newlines, as it is supposed be the equivalent of \s (except that it recognizes an additional character, vertical-tab, to maintain POSIX compliance).

However, having just tested this on 5.10.0, I can verify that it is matching newlines as well. Whether this qualifies as a bug in Perl or in the documentation, I'll leave for the Perl maintainers. But to avoid the immediate problem, use the previous answerer's solution and just use \s instead of the POSIX class.

rjray
There is no bug; \s is indeed supposed to match linefeeds and carriage returns, and so is [:space:]. Maybe you're thinking of [:blank:], which only matches spaces and tabs.
Alan Moore
\s stands for "whitespace character". In all regex flavors it includes [ \t\r\n]. http://www.regular-expressions.info/charclass.html#shorthand
Andomar
My bad-- I was conflating the fact that "." in a regex only matches newlines when the /s flag is present.
rjray