ansaurus

Question

How do I use Perl to intersperse characters between consecutive matches with a regex substitution?

Answer 1

+3 A:

EDIT: Note that you could open a filehandle to the data string and let readline deal with line endings:

#!/usr/bin/perl

use strict; use warnings;
use autodie;

my $str = <<EO_DATA;
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,
EO_DATA

open my $str_h, '<', \$str;

while(my $row = <$str_h>) {
    chomp $row;
    print join(',',
        map { length $_ ? $_ : 'N/A'} split /,/, $row, -1
    ), "\n";
}

Output:

E:\Home> t.pl
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,N/A,N/A,N/A

You can also use:

pos $str -= 1 while $str =~ s{,(,|\n)}{,N/A$1}g;

Explanation: When s/// finds a ,, and replaces it with ,N/A, it has already moved to the character after the last comma. So, it will miss some consecutive commas if you only use

$str =~ s{,(,|\n)}{,N/A$1}g;

Therefore, I used a loop to move pos $str back by a character after each successful substitution.

Now, as @ysth shows:

$str =~ s!,(?=[,\n])!,N/A!g;

would make the while unnecessary.

Sinan Ünür 2009-10-29 19:54:05

Nice. Good example that while regular expressions are frequently used in Perl, they're not always the best solution.

jamessan 2009-10-29 19:56:41

@Sinan: I'd rather not deal with filehandles. The data is already loaded into a string with `\n`s. Is what I want possible with one regex `s///`?

Zaid 2009-10-29 19:56:53

@Sinan: Evidently I have much to learn about Perl. That's a wonderful one-liner, which does exactly what I need it to do. Absolutely stunning.

Zaid 2009-10-29 20:13:16

decrement works too: `--pos $str`

ysth 2009-10-30 02:11:06

Answer 2

+1 A:

The quick and dirty hack version:

my $rawData = "2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n";
while ($rawData =~ s/,,/,N\/A,/g) {};
print $rawData;

Not the fastest code, but the shortest. It should loop through at max twice.

Jack M. 2009-10-29 20:10:57

Concise, but like you said, quick and dirty.

Zaid 2009-10-29 20:19:21

Answer 3

+2 A:

I couldn't quite make out what you were trying to do in your lookbehind example, but I suspect you are suffering from a precedence error there, and that everything after the lookbehind should be enclosed in a (?: ... ) so the | doesn't avoid doing the lookbehind.

Starting from scratch, what you are trying to do sounds pretty simple: place N/A after a comma if it is followed by another comma or a newline:

s!,(?=[,\n])!,N/A!g;

Example:

my $rawData = "2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n";

use Data::Dumper;
$Data::Dumper::Useqq = $Data::Dumper::Terse = 1;
print Dumper($rawData);
$rawData =~ s!,(?=[,\n])!,N/A!g;
print Dumper($rawData);

Output:

"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n"
"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,N/A,N/A,N/A\n"

ysth 2009-10-29 20:12:40

@ysth: Agreed. This definitely works. Is this because the lookahead assertion is non-capturing?

Zaid 2009-10-29 20:28:51

+1 I don't know why but I avoid assertions. In this case, I could not see the obvious because of my aversion.

Sinan Ünür 2009-10-29 20:30:49

Funny how simple these regex solutions tend to be....

Zaid 2009-10-29 20:50:05

@Zaid: non-capturing isn't good enough (`(?: )` wouldn't work). What matters is how much of the string has matched. The lookahead part is not included in what s/// considers to have matched, so the next iteration of the substitution matching starts looking for a match right after the new N/A.

ysth 2009-10-29 20:52:41

Answer 4

+2 A:

You could search for

(?<=,)(?=,|$)

and replace that with N/A.

This regex matches the (empty) space between two commas or between a comma and end of line.

Tim Pietzcker 2009-10-29 20:13:01

+1 but it would have to be `s!(?<=,)(?=,|\n)!N/A!g;` to catch an empty field at the end of a line.

Sinan Ünür 2009-10-29 20:39:12

Yeah, I had just noticed that, too.

Tim Pietzcker 2009-10-29 20:40:22

Answer 5

+1 A:

Not a regex, but not too complicated either:

$string = join ",", map{$_ eq "" ? "N/A" : $_} split (/,/, $string,-1);

The ,-1 is needed at the end to force split to include any empty fields at the end of the string.

mobrule 2009-10-29 20:16:46

This would fail for an empty field at the end of the line because it would contain `"\n"` which is why I `chomp` first in my `split` example.

Sinan Ünür 2009-10-29 20:43:23

@SU - good catch. Best to use this on chomped input.

mobrule 2009-10-29 20:52:30

ansaurus

tags:

views:

answers:

How do I use Perl to intersperse characters between consecutive matches with a regex substitution?

related questions