views:

288

answers:

6

Using Ruby (newb) and Regex, I'm trying to parse the street number from the street address. I'm not having trouble with the easy ones, but I need some help on:

'6223 1/2 S FIGUEROA ST' ==> 'S FIGUEROA ST'

Thanks for the help!!

UPDATE(s):

'6223 1/2 2ND ST' ==> '2ND ST'

and from @pesto '221B Baker Street' ==> 'Baker Street'

+2  A: 

Group matching:

.*\d\s(.*)

If you need to also take into account apartment numbers:

.*\d.*?\s(.*)

Which would take care of 123A Street Name

That should strip the numbers at the front (and the space) so long as there are no other numbers in the string. Just capture the first group (.*)

Bryan Denny
123 2nd ST is a problem then.
kenny
A: 

/[^\d]+$/ will also match the same thing, except without using a capture group.

Ben Hughes
A: 

For future reference a great tool to help with regex is http://www.rubular.com/

Andrew Austin
+1  A: 

This will strip anything at the front of the string until it hits a letter:

street_name = address.gsub(/^[^a-zA-Z]*/, '')

If it's possible to have something like "221B Baker Street", then you have to use something more complex. This should work:

street_name = address.gsub(/^((\d[a-zA-Z])|[^a-zA-Z])*/, '')
Pesto
good point, but thanks
kenny
@Kenny: I updated it with a regex that will resolve '221B' type stuff, while still handling '1/2'.
Pesto
2nd try works well.
kenny
+1  A: 

There's another stackoverflow set of answers: http://stackoverflow.com/questions/16413/parse-usable-street-address-city-state-zip-from-a-string

I think the google/yahoo decoder approach is best, but depends on how often/many addresses you're talking about - otherwise the selected answer would probably be the best

meade
+1  A: 

Can street names be numbers as well? E.g.

1234 45TH ST

or even

1234 45 ST

You could deal with the first case above, but the second is difficult.

I would split the address on spaces, skip any leading components that do not contain a letter and then join the remainder. I do not know Ruby, but here is a Perl example which also highlights the problem with my approach:

#!/usr/bin/perl

use strict;
use warnings;

my @addrs = (
    '6223 1/2 S FIGUEROA ST',
    '1234 45TH ST',
    '1234 45 ST',
);

for my $addr ( @addrs ) {
    my @parts = split / /, $addr;

    while ( @parts ) {
        my $part = shift @parts;
        if ( $part =~ /[A-Z]/ ) {
            print join(' ', $part, @parts), "\n";
            last;
        }
    }
}

C:\Temp> skip
S FIGUEROA ST
45TH ST
ST
Sinan Ünür