views:

453

answers:

3

In Perl, how can I replace a pattern from the current position (the position of the last replacement) until the end of a line?

I have done all of these replacements in a single line:

...
s/\[//;
s/(\/\w\w\w\/)/ getMonth $1 /e;
s/:/ /;
s/\s\+\d\d\d\d\]//;
#NOW: replace all blanks with a plus sign from this position until the end of this line.
A: 

Since Perl 5.6, the position at the end of the last match is stored in the @+ array. The position at the end of the entire match is $+[0].

You can use this to split the string in two parts, and do a replacement on only the later part:

my $base = " pears apples bananas coconuts ";
$base =~ s/apples/oranges/;
my $firstpart = substr($base, 0, $+[0]);
my $secondpart = substr($base, $+[0]); 
$secondpart =~ s/ /\+/g;
print '"' . $firstpart . $secondpart . "\"\n";

Which will print:

" pears oranges+bananas+coconuts+"

One problem with this approach is that $+[0] contains the position before the replacement. So perhaps there is a better way :)

Andomar
It would be far better to replace this sequence of `s///` operations the OP has (which seem to say "replace month+year in square brackets with something that `getMonth` returns") with a simplified operation that is more concise and allows the rest of the requirements to be satisfied. But, that requires cooperation from the OP.
Sinan Ünür
**`$+`** is **not** an array. **`@+`** is. I corrected your mistake and linked to the correct location in the documentation. Rolling back those factual corrections (which you can easily verify) is not right. http://perldoc.perl.org/perlvar.html#%40%2b
Sinan Ünür
@Sinan Ünür: If you add a comment, I can edit my answer if I agree (done here)
Andomar
@Andomar: *If you are not comfortable with the idea of your questions and answers being edited by other trusted users, this may not be the site for you.* See http://stackoverflow.com/faq Trying to impose a requirement that others get approval from you before fixing your errors is contrary to SO's spirit.
Sinan Ünür
Thanks Andomar, you really understood what I was asking.
Lucia
That's a lot of work to avoid substr() as an lvalue.
brian d foy
@Andomar: do you seriously prefer *perl* to *Perl* when describing the language, or has this just become a pissing contest for you?
Telemachus
I agree with brian's comment above - `substr($base, $+[0]) =~ s/\s/\+/g;` can be used instead of lines 3, 4 and 5. Besides, `substr()` as an lvalue is utterly cool.
Leonardo Herrera
+8  A: 

I see you have accepted an answer. However, for the task at hand, it would have been more appropriate to use Apache::ParseLog or maybe Apache::LogRegex:

Apache::LogRegex - Parse a line from an Apache logfile into a hash

It looks to me like you are trying to write a log file analyzer from scratch and this is your way of grouping log file entries by month. If that is the case, please stop re-inventing square wheels.

Even if you do not want to use external modules, you can simplify the task by dividing and conquering using split:

#!/usr/bin/perl

use strict; use warnings;
use Carp;
use Regex::PreSuf;

my @months = qw(jan feb mar apr may jun jul aug sep oct nov dec);
my %months = map { $months[$_] => sprintf '%02d', $_ + 1 } 0 .. 11;
my $months_re = presuf( @months );

# wrapped for formatting, does not make any difference
my $str = q{62.174.188.166 - - [01/Mar/2003:00:00:00 +0100] "GET
/puntos/img/ganar.gif HTTP/1.1" 200 1551
"http://www.universia.com/puntos/index.jsp";
"Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt; Hotbar 2.0)"};

chomp($str);

my @parts = split qr{\s\[|\]\s}, $str;

if ( $parts[1] =~ m! / ($months_re) / !ix ) {
    $parts[1] = $1;
}

$parts[2] =~ s/\s/+/g;

print join(' ', @parts), "\n";

Output:

62.174.188.166 - - Mar "GET+/puntos/img/ganar.gif+HTTP/1.1"+200+1551+"http://www .universia.com/puntos/index.jsp";+"Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+98 ;+DigExt;+Hotbar+2.0)"

Sinan Ünür
+2  A: 

From your language, you seem to be imagining your sequence of substitutions are working forward through the string, each substitution taking up where the last one left off. In fact, each substitution will apply to the entire string.

When you say "the position of the last replacement", what should happen if the previous substitution found nothing?

In a script, you can just do:

if ( s/\s\+\d\d\d\d\]// ) { $' =~ s/ /+/g }

but use of $' should be avoided in reusable code, since it can impact performance of other regular expressions. There, you'd need to do

if ( s/\s\+\d\d\d\d\]// ) { substr($_, $+[0]) =~ s/ /+/g }

but in either case, you need to make sure that the match or substitution you expect to have set $' or @+ actually succeeded.

ysth
"this" is the position of the last replacement, where `\s\+\d\d\d\d` matches. If you know a better way than `$+[0]` please post :)
Andomar
@Andomar: sorry, didn't read the question well enough; completely replaced my answer
ysth
+1 The `s/ .*/+/` should probably be `s/ /+/g`, but nice to see that a replace on a substr changes the original string
Andomar
ah, misread you again. fixed. y/ /+/ would also work, then.
ysth