ansaurus

Question

Best way to split an address line into two fields

Answer 1

A:

What I did, but I doubt that it is the most performant solution is to reverse the address and then get the first part till you find a digit and take them all. i.e. the regex .*\d+ on the reversed address. This solves your problem when a street contains a digit.

Ruben 2009-06-29 17:46:03

Answer 2

A:

Can you do something where you split on spaces, and then check to see if the first character of some interior string is an integer?

like

 char[] splits = new char[1];
 splits[0] = ' ';
 string[] split = addressLine.split(splits);
 int splitLoc = -1, i;
 for (i =1; i < split.Length; i++){//start at 1 to avoid the first '2e' streets
     int theFirstDigit = -1;
     try{
        theFirstDigit = int.Parse(split[i].Substring(0,1));
     }catch {
        //ignore; parse fails with an exception
     }
     if (theFirstDigit != -1){
         splitLoc = i;
         break;
     }
 }
 if (splitLoc < 0) return; //busted
 string field1, field2;
 for (i = 0; i < splitLoc; i++){
     field1+= split[i] + " ";
 }

 for (i = splitLoc; i < split.Length; i++){
     field2+= split[i] + " ";
 }

Depends on what you mean by 'clean', but it does look like that would work, if all addresses can be formed the way you specified.

mmr 2009-06-29 17:46:40

Answer 3

A:

The best solution for data correctness would be to compare the existing database against a known address api that has a function to do this for you. Otherwise you're just giving your best guess and some, if not all, of the data should be manually reviewed.

Greg 2009-06-29 18:04:06

Answer 4

A:

There are too many different ways someone could enter this data. I often write my address as:

123 Foo Street Apt#3

ie with the house and apartment numbers on either end of the street name

If this was my problem I would write a regex that handles the "easy" ones and flags the complicated ones for human review.

You can find a list of street names in the US from the Census Bureau but it is buried inside a monster datafile

Autodidact 2009-06-29 18:25:54

ansaurus

tags:

views:

answers:

Best way to split an address line into two fields

related questions