views:

251

answers:

6

Does anyone have a php class, or regex to parse an address into components? At least, it should break up into these components: street info, state, zip, country

+4  A: 

A library/language agnostic solution would be to use Google's geocoder for this. It can return detailed, broken-down information about a given address.

http://code.google.com/apis/maps/documentation/services.html#Geocoding%5FStructured

Stuart Branham
+1 Looks like a good resource. keep in mind, though, that it may allow parsing -partial- addresses, such that if you just give it only country and a state, it may be fine with that kind of partial information, whereas if you're building an app that needs to use mailing addresses down the line, that kind of open-ended allowance can come back to bite you. *shrugs*
Tchalvak
Couldn't you just validate against the results of the parse? That is, if you don't mind potentially hitting Google a lot.
Stuart Branham
A: 

You will have hard time finding such class.

What you are asking for is very ambiguous and even such class would exist, it wouldnt work very well.

Think why? Think about addresses in terms of grammer.

A: 

If you're talking about pre-existing data, good luck to ye. If this is something that you have control over the input for, I recommend creating separation of the different parts of the address at the input level. Jus' a suggestion.

Tchalvak
I tried to keep the interface as simple as possible. I wanted to give the users the ability to simply enter the address in a textarea and I would later try to parse it.
Andres
Keep in mind that by providing only an open-ended field, you're actually making it -more- complex, because you aren't specifying any strong requirements for the input, so the users -will- put in information in formats that you -won't- be able to parse. Better to provide separated fields and then provide an "other address info" field for addresses that might not fit your pattern.
Tchalvak
A: 

How about this one,

http://www.analysisandsolutions.com/software/addr/

ZZ Coder
+1  A: 

Edit: Use this just as an example, if your data is all formatted very similarly. As Strager pointed out, in most cases there will be too much variation in data to use a regex effectively.

Assuming your input is of the format:

[Street Name], [State], [ZIP], [Country]

This regex will do the trick:

m/^(.+?),(.+?),([0-9]+),(.+)$/

But Regular Expressions are fairly complex...if you are going to use this for anything significant, I would suggest taking the time to learn Regexes. I have always found this cheat sheet very useful:

http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/

Goose Bumper
Due to the many possible forms for addresses, I don't think a regular expression is feasible.
strager
@Strager: Great point, thank you. Edited.
Goose Bumper
+1 for the regex cheat sheet, useful, though I don't think that regex is going to be a great solution to an open address field.
Tchalvak
+1  A: 

Here is a Python version using pyparsing for parsing street addresses. It's not PHP, but might give you some insights into the complexity of the problem.

Paul McGuire