views:

2595

answers:

6

I'm looking for a good tool that can take a full mailing address, formatted for display or use with a mailing label, and convert it into a structured object.

So for instance:

// Start with a formatted address in a single string
string f = "18698 E. Main Street\r\nBig Town, AZ, 86011";

// Parse into address
Address addr = new Address(f);

addr.Street; // 18698 E. Main Street
addr.Locality; // Big Town
addr.Region; // AZ
addr.PostalCode; // 86011

Now I could do this using RegEx. But the tricky part is keeping it general enough to handle any address in the world!

I'm sure there has to be something out there that can do it.

If anyone noticed, this is actually the format of the opensocial.address object.

A: 

Dupe of this:

http://stackoverflow.com/questions/16413/parse-usable-street-address-city-state-zip-from-a-string

I assume you mean U.S. Addresses.

danieltalsky
+3  A: 

This is a difficult problem when you bring international addresses into the mix. I know that Japanese addresses don't follow the street1/street2/city/state/zip model that you presented. They go down to the street, block, and building in a way that's different from typical US addresses. Other addresses in Europe are different as well.

That regex had better be Unicode, because our alphabet won't be sufficient.

Not an easy problem, IMO.

duffymo
A: 

You could try qas. Has it's issues but pretty much works as advertised.

Steve B.
+1  A: 

As there is no trivial solution like @duffymo said, the next best thing might be to reconsider the design. If it's a user form, make a compromise and let the user fill it. If you are retroactively parsing data, then use a very strict regex to parse addresses based on some criteria (country is US). Then make a second pass at the ones that are left over and so on. I have taken this approach and it's the only reliable approach.

Another design problem with taking a generic regex approach is that it will generate false positive for bad addresses. If you are sending out snail mail to these people, it will end up bouncing and you'll have more work at your hands trying to sort out which ones came back or continue to send mails to erroneous addresses.

aleemb
+6  A: 

The Googlemaps API works pretty well for this. E.g., suppose you are given the string "120 w 45 st nyc". Pass it into the Googlemaps API like so: http://maps.google.com/maps/geo?q=120+w+45+st+nyc and you get this response:

{
  "name": "120 w 45 st nyc",
  "Status": {
    "code": 200,
    "request": "geocode"
  },
  "Placemark": [ {
    "id": "p1",
    "address": "120 W 45th St, New York, NY 10036, USA",
    "AddressDetails": {"Country": {"CountryNameCode": "US","CountryName": "USA","AdministrativeArea": {"AdministrativeAreaName": "NY","Locality": {"LocalityName": "New York","Thoroughfare":{"ThoroughfareName": "120 W 45th St"},"PostalCode": {"PostalCodeNumber": "10036"}}}},"Accuracy": 8},
    "ExtendedData": {
      "LatLonBox": {
        "north": 40.7603883,
        "south": 40.7540931,
        "east": -73.9807141,
        "west": -73.9870093
      }
    },
    "Point": {
      "coordinates": [ -73.9838617, 40.7572407, 0 ]
    }
  } ]
}
Horace Loeb
I guess I would like to know how Google does it.
+1  A: 

I tried RecogniContact recently. It is a Windows COM component that parses US and European addresses. You can test it from the website.

http://www.loquisoft.com/index.php?page=8