tags:

views:

152

answers:

5

I am using Perl to parse out sizes in a string. What is the regex that I could use to accomplish this:

Example Data: Sleepwell Mattress (Twin)
Magic Nite (Flip Free design) Mattress (Full XL)

Result: Twin Full XL

I know that I need to start at the end of the string and parse out the first set of parenthesis just not sure how to do it.

#!/usr/bin/perl

$file = 'input.csv';

open (F, $file) || die ("Could not open $file!");

while ($line = <F>)
{
  ($field1,$field2,$field3,$field4,$field5,$field6,$field7, $field8, $field9) = split ',', $line;
  if ( $field1 =~ /^.*\((.*)\)/ ) {
  print $1;
}


#print "$field1,$field2,$field3,$field4,$field5,$field6,$field7, $field8, $field9, $1\n";
}

close (F);

Not getting any results. Maybe I am not doing this right.

A: 

Assuming your data arrives line by line, and you are only interested in the contents of the last set of parens:

if ( $string =~ /^.*\((.*)\)/ ) {
  print $1;
}
Daren Schwenke
Although it doesn't hurt, the ^ is useless if you start your regex with .*
brian d foy
+5  A: 

The answer depends on if the size information you are looking for always appears within parentheses at the end of the string. If that is the case, then your task is simple:

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA> ) {
    last unless /\S/;
    my ($size) = /\( ( [^)]+ ) \)$/x;
    print "$size\n";
}

__DATA__
Sleepwell Mattress (Twin)
Magic Nite (Flip Free design) Mattress (Full XL)

Output:

C:\Temp> xxl
Twin
Full XL

Note that the code you posted can be better written as:

#!/usr/bin/perl

use strict;
use warnings;

my ($input_file) = @ARGV;

open my $input, '<', $input_file
    or die "Could not open '$input_file': $!";

while (my $line = <$input>) {
    chomp $line;
    my @fields = split /,/, $line;
    if ($field[0] =~ /\( ( [^)]+ ) \)$/x ) {
        print $1;
    }
    print join('|', @fields), "\n";
}

close $input;

Also, you should consider using Text::xSV or Text::CSV_XS to process CSV files.

Sinan Ünür
A: 

This is the answer as expressed in Perl5:

my $str = "Magic Nite (Flip Free design) Mattress (Full XL)";
$str =~ m/.*\((.*)\)/;
print "$1\r\n";
Jez
Why the `"\r\n"`?
Sinan Ünür
Because I'm in Windows.
Jez
+1  A: 

The following regular expression will match the content at the end of the string:

m/\(([^)]+)\)$/m

The m at then end matches mutli-line strings and changes the $ to match at the end of the line, not the end of the string.

[edited to add the bit about multi-line strings]

Stoo
A: 

fancy regex is not really necessary here. make it easier on yourself. you can do splitting on "[space](" and get the last element. Of course, this is when the data you want to get is always at the last...and have parenthesis

while(<>){
    @a = split / \(/, $_;
    print $a[-1]; # get the last element. do your own trimming
}
ghostdog74