tags:

views:

364

answers:

7

What does the following syntax mean in Perl?

$line =~ /([^:]+):/;

and

$line =~ s/([^:]+):/$replace/;
+2  A: 

You want to return something matching one or more characters that are anything but : followed by a : and the second one you want to do the same thing but replace it with $replace.

Copas
+11  A: 

See perldoc perlreref

[^:]

is a character class that matches any character other than ':'.

[^:]+

means match one or more of such characters.

I am not sure the capturing parentheses are needed. In any case,

([^:]+):

captures a sequence of one or more non-colon characters followed by a colon.

Sinan Ünür
I thought the first one was a chicken. The second is a chicken with bad hair, and the third is a frowning person holding a chicken with bad hair. Frowning people tend to mess up data sets, especially when they have chickens--so it's a good thing to be able to find them and replace them with something less disruptive!
daotoad
I'd say the last one is more like a guy eating a chicken with bad hair. ( insertfoodhere ):
+3  A: 

The first one captures the part in front of a colon from a line, such as "abc" in the string "abc:foo". More precisely it matches at least one non-colon character (though as many as possible) directly before a colon and puts them into a capture group.

The second one substitutes said part, although this time including the colon by the contents of the variable $replace.

Joey
The colon is included in the match, so it gets replaced as well.
Michael Carman
Ah, sorry. My bad. I was sill focused on that capture group.
Joey
A: 
$line =~ /([^:]+):/;

Matches anything that does not contain : before :/

If $line = "http://www.google.com", it will match http (the variable $1 will contain http)

$line =~ s/([^:]+):/$replace/;

This time, replace the value matched by the content of the variable $replace

Julien
+3  A: 

I may be misunderstanding some of the previous answers, but I think that there's a confusion about the second example. It will not replace only the captured item (i.e., one or more non-colons up until a colon) by $replaced. It will replace all of ([^:]+): with $replace - the colon as well. (The substitution operates on the match, not just the capture.)

This means if you don't include a colon in $replace (and you want one), you will get bit:

my $line = 'http://www.example.com/';
my $replace = 'ftp';
$line =~ s/([^:]+):/$replace/;
print "Here's \$line now: $line\n";

Output:

Here's $line now: ftp//www.example.com/ # Damn, no colon!

I'm not sure if you are just looking at example code, but you unless you plan to use the capture I'm not sure you really want it in these examples.

If you are very unfamiliar with regular expressions (or Perl), you should look at perldoc perlrequick before trying perldoc perlre or perldoc perlretut.

Telemachus
+3  A: 
$line =~ /([^:]+):/;

The =~ operator is called the binding operator, it runs a regex or substitution against a scalar value (in this case $line). As for the regex itself, () specify a capture. Captures place the text that matches them in special global variables. These variables are numbered starting from one and correspond to the order the parentheses show up in, so given

"abc" =~ /(.)(.)(.)/;

the $1 variable will contain "a", the $2 variable will contain "b", and the $3 variable will contain "c" (if you haven't guessed yet . matches one character*). [] specifies a character class. Character classes will match one character in them, so /[abc]/ will match one character if it is "a", "b", or "c". Character classes can be negated by starting them with ^. A negated character class matches one character that is not listed in it, so [^abc] will match one character that is not "a", "b", or "c" (for instance, "d" will match). The + is called a quantifier. Quantifiers tell you how many times the preceding pattern must match. + requires the pattern to match one or more times. (the * quantifier requires the pattern to match zero or more times). The : has no special meaning to the regex engine, so it just means a literal :.

So, putting that information together we can see that the regex will match one or more non-colon characters (saving this part to $1) followed by a colon.

$line =~ s/([^:]+):/$replace/;

This is a substitution. Substitutions have two parts, the regex, and the replacement string. The regex part follows all of the same rules as normal regexes. The replacement part is treated like a double quoted string. The substitution replaces whatever matches the regex with the replacement, so given the following code

my $line    = "key: value";
my $replace = "option";

$line =~ s/([^:]+):/$replace/;

The $line variable will hold the string "option value".

You may find it useful to read perldoc perlretut.

* except newline, unless the /m option is used, in which case it matches any character

Chas. Owens
+1  A: 

perl -MYAPE::Regex::Explain -e "print YAPE::Regex::Explain->new('([^:]+):')->explain"

The regular expression:

(?-imsx:([^:]+):)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [^:]+                    any character except: ':' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
Philip Durbin