ansaurus

Question

Answer 1

+1 A:

Looks like you always want the very last element in the result of split(). Or you can go with m/(\S+)$/.

Arkadiy 2010-07-23 14:54:31

I think it is always the last element, but you never know when someone else is inputing the data.

shinjuo 2010-07-23 14:59:09

@shinjuo: If you cannot define your input, how do you expect to define your output? You need at least *some* specification. Even if it's broad, you still need to say that you will reject data that doesn't conform.

Daenyth 2010-07-23 15:46:40

That is what I will have to do. I wasnt saying I wanted to keep screwed up data I am just saying there may be the chance something could be tacked on the end so that may change how it is written

shinjuo 2010-07-23 16:40:16

Answer 2

+3 A:

Try this:

for my $s (@strings) {
    my @fields = split /\s+/, $s, 3;
    my $city = $fields[-1];
}

You can test the array size to determine the number of fields:

my $n = @fields;

eugene y 2010-07-23 14:55:23

I'd add a limit to the number of fields that can be split, otherwise you'll get a surprise when you try to parse `2 SE SAN FRANCISCO`.

Ether 2010-07-23 15:00:40

@Ether: Thanks, corrected.

eugene y 2010-07-23 15:03:07

Answer 3

+1 A:

Can't we assume there is always a city name and that it appears last on a line? If that's the case, split the line and keep the last portion of it. Here's a one liner command line solution:

perl -lne 'split ; print $_[-1]' input.txt

Output:

HARRISBURG
HASWELL
OAKLEY
REDBIRD
PROVO
EADS
HARRISON

Update 1

This solution won't work if you have composed city names like SAN FRANCISCO (case spotted in a comment below).

Where is your input data coming from? If you have generated it yourself, you should add delimiters. If someone generated it for you, ask them to regenerate it with delimiters. Parsing it will then become child's play.

# replace ";" for your delimiter
perl -lne 'split ";" ; print $_[-1]' input.txt

Philippe A. 2010-07-23 14:58:52

I want to keep the to keep the front portion also.

shinjuo 2010-07-23 14:59:55

@Philippe: You can probably reduce that to `perl -anE 'say $F[-1]' input.txt` if you're using whitespace as the delimiter.

Daenyth 2010-07-23 15:48:32

I am not making it nor can I ask them to adjust it.

shinjuo 2010-07-23 18:27:24

@Daenyth: good to know. Thanks!

Philippe A. 2010-07-26 14:10:36

Answer 4

+3 A:

my @l = (
'10 NE HARRISBURG',
'4 E HASWELL',
'2 SE OAKLEY',
'6 SE REDBIRD',
'PROVO',
'6 W EADS',
'21 N HARRISON',
);

foreach(@l) {

according to hoobs i changed the regex

    my($beg, $rest) = ($_ =~ /^(\d*\s(?:[NS]|[NS]?[EW])*)?(.*)$/);
    print "beg=$beg \trest=$rest\n";    
}

output:

beg=10 NE   rest=HARRISBURG
beg=4 E     rest=HASWELL
beg=2 SE    rest=OAKLEY
beg=6 SE    rest=REDBIRD
beg=    rest=PROVO
beg=6 W     rest=EADS
beg=21 N    rest=HARRISON

for shinjuo, if you want to run only one string you can do :

  my($beg, $rest) = ($l[3] =~ /^(\d*\s(?:[NS]|[NS]?[EW])*)?(.*)$/);
  print "beg=$beg \trest=$rest\n";

and to avoid warning on uninitialized value you have to test if $beg is defined:

print defined$beg?"beg=$beg\t":"", "rest=$rest\n";

M42 2010-07-23 14:59:07

Awesome this looks like it will work well

shinjuo 2010-07-23 15:01:00

@M42,@shinjuo:I think, in the second last record, regular expression fails.It should be: beg= 6 W rest= EADS.

Nikhil Jain 2010-07-23 16:04:12

I did not notice that, but you are correct thanks

shinjuo 2010-07-23 16:39:12

You're right. i've corrected the regex.

M42 2010-07-23 17:26:54

I like this one because it's using a feature of the data that the others don't. You could probably extend it even a little more, in that a direction isn't just `/[NSEW]*/`; it's `/[NS]|[NS]?[EW]/` (that is, it's either N, S, E, or W alone, or it's one of N/S followed by one of E/W. The number and the order aren't arbitrary. That might save you some day if the city happens to be `NEW ABILENE` :)

hobbs 2010-07-24 00:45:12

How can I make so that it runs them one at a time instead of an array of them? instead of foreach I tried using this: for($fields[3]) { ($beg, $rest) = ($_ =~ /^(\d*\s[NSEW]*)?(.*)$/); print $beg; }

shinjuo 2010-07-24 04:06:17

But it gives me an unitialized variable error on $beg

shinjuo 2010-07-24 04:06:50

@hoobs thanks, updated regex

M42 2010-07-24 08:38:39

Is hoobs, hobbs?

Armando 2010-07-24 23:33:10

@Armando they're similar, but not the same ;)

hobbs 2010-07-25 07:29:15

@hobbs : sorry for mispelling.

M42 2010-07-25 08:02:39

@hobbs/hoobs: =]

Armando 2010-07-30 03:01:49

Answer 5

+1 A:

Regex Solution

Solution 1: Keep everything (vol7ron's emailed solution)

#!/usr/bin/perl -w    

use strict; 
use Data::Dumper;   

   sub main{    
      my @strings = (    
                      '10 NE HARRISBURG'    
                    , '4 E HASWELL'    
                    , '2 SE OAKLEY'    
                    , '6 SE REDBIRD'    
                    , 'PROVO'    
                    , '6 W EADS'    
                    , '21 N HARRISON'    
                    , '32 SAN FRANCISCO' 
                    , ''   
                    , '15 NEW YORK'    
                    , '15 NNW NEW YORK'    
                    , '15 NW NEW YORK'     
                    , 'NW NEW YORK'    
                    );       

      my %hash;
      my $count=0;
      for (@strings){    
         if (/\d*\s*[NS]{0,2}[EW]{0,1}\s+/){
            # if there was a speed / direction
            $hash{$count}{wind} = $&;
            $hash{$count}{city} = $';
         } else {
            # if there was only a city
            $hash{$count}{city} = $_;
         }
         $count++;
      }    

      print Dumper(\%hash);  
   }    

   main();

Solution 2: Strip off what you don't need

#!/usr/bin/perl -w    

use strict;    

   sub main{    
      my @strings = (    
                      '10 NE HARRISBURG'    
                    , '4 E HASWELL'    
                    , '2 SE OAKLEY'    
                    , '6 SE REDBIRD'    
                    , 'PROVO'    
                    , '6 W EADS'    
                    , '21 N HARRISON'    
                    , '32 SAN FRANCISCO'    
                    , '15 NEW YORK'    
                    , '15 NNW NEW YORK'    
                    , '15 NW NEW YORK'     
                    , 'NW NEW YORK'     
                    );    

      for my $elem (@strings){    
         $elem =~ s/\d*\s*[NS]{0,2}[EW]{0,1}\s+(\w*)/$1/;    
      }    

      $"="\n";    
      print "@strings\n";        
   }    

   main();

Update:

Making the changes with vol7ron's suggestion and example, using the repetition operator worked. This will strip off leading digits and the direction and won't break if the digits or direction (or both) are missing.

Armando 2010-07-23 23:59:24

looks good, but instead of `\w+` might want to use `\w{1,2}`, since the direction seems to only be a max of 2 chars. If the OP uses 3 char directions (eg `NNE`,`NSW`) then you'd change the 2 for a 3.

vol7ron 2010-07-24 23:00:31

Instead of `\w` you might also want to use char selection (`[NSEW]{0,3}`). That way if something like `2 SAN FRANCISCO` comes along it won't chop off the `SAN`.

vol7ron 2010-07-24 23:03:17

I haven't tried any of these suggestions out, but perhaps `[NS]{0,2}[EW]{0,1}` would be what you want, since it would take care of `N,S,NE,SE,NW,SW,NNE,NNW,NSE,NSW,SSE,SSW,SNE,SNW`, which wouldn't fail on `NEW` as Hobbs pointed out might happen.

vol7ron 2010-07-24 23:15:06

ansaurus

tags:

views:

answers:

Splitting a changing string with perl

according to hoobs i changed the regex

Regex Solution

Solution 1: Keep everything (vol7ron's emailed solution)

Solution 2: Strip off what you don't need

Update:

related questions