tags:

views:

274

answers:

4

I'm doing a simple search-and-replace in Perl, but I need some help. These are lines in a file:

1001(seperator could be "anything")john-1001(seperator could be "anything")mark
1001(seperator could be "anything")mark-1001(seperator could be "anything")john

I wanna assign a new userID for john, like 2001. So this is the result I want:

2001($1)john-1001-mark
1001-mark-2001($1)john

My regex works fine when john is first, but when mark is first, it get messed up.

Thanks! =)

+3  A: 

It's almost impossible to answer this without having some idea of what the separator can be -- which characters, how many characters, etc. A non-greedy arbitrary separator would look like this:

s/\b1001\b(?=.*?\bjohn\b)/2001/

This replaces "1001" when followed by "john" while matching the minimum number of intermediate characters. .*? is the non-greedy version of .*. However, regexes always match if possible so this would still match

1001-mark-1001-john

In other words, it's not just a greediness problem. We need to define at least one of three things:

  • The characters the separator can contain.
  • The characters the separator cannot contain.
  • The number of characters in the separator.

If we assume that the separator cannot contain "word" characters (a-z, 0-9, and underscore) we can get something workable:

s/\b1001\b(?=\W+?\bjohn\b)/2001/

The known parts ("1001" and "john") are bounded to prevent them from matching other strings with these substrings. (Thanks to Chas for noticing that edge case.)

Michael Carman
My problem is that I used (.*) to fetch anything between the userID and "john", because it varies a lot. But then when "mark" was first, it naturally got messed up.. so how can I go around this?
+3  A: 

Try this:

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    s/\b1001-john\b/2001-john/;
    print;
}

__DATA__
1001-john-1001-mark
1001-mark-1001-john
11001-john
1001-johnny

The \b prevents it from matching things other than "1001-john". See the "Assertions" section of perldoc perlre for more information.


Hmmm, it sounds like you need a sexeger:

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    my $s = reverse;
    $s =~ s/\bnhoj(.*?)1001\b/nhoj${1}1002/;
    $s = reverse $s;
    print $s;
}

__DATA__
1001-john-1001-mark
1001-mark-1001-john
11001-john
1001-johnny

The basic idea of a sexeger is to reverse the string, use a reversed regex, and then reverse the result. The problem is that .*? gives you the shortest string from the first match, not the shortest possible string. Of course this will still have a problem with "1001-mark-2001-john" as the .*? will match "-mark-2001-". It is probably better to determine what the file format is and parse it rather than try to use a regex.

Chas. Owens
A: 

it can be something like

$s = '1001-mark-1001-john';
$s =~ s/(\d+)(-john)/2001$2/i;
print $s;
動靜能量
he edited the question. so my solution that matched his problem can keep up with his future edits.
動靜能量
A: 

I'm guessing from your comments that the separator is not always a hyphen, and can in fact be more than one character.

For this case, try:

s/\d+([^\d]*)john/2001$1john/

This will keep the separator between "1001" and "john" intact during the replacing. Note that no digits are permitted in the separator, so this will work even when "john" appears after "mark" (because "-mark-1001-" is not a valid separator).

Michael Myers
Is this answer wrong? If you're going to downvote it, please tell me what I could do better; I'm not a regex guru (as you may have guessed).
Michael Myers