This code seems to do what you want:
text = """
Boulder, Co
80303
Boulder, Colorado
Boulder, Co 80303
"""
lines = text.splitlines()
ABBREV = dict(co="Colorado", ca="California")
STATES = ABBREV.values()
def parse_addr(line):
addr = {}
# normalize commas
parts = line.replace(",", " ").split()
for part in parts:
if part.capitalize() in STATES:
addr["state"] = part
elif part.lower() in ABBREV:
addr["state"] = ABBREV[part.lower()]
else:
try:
zip = int(part)
addr["zip"] = part
except ValueError:
addr["city"] = part
return addr
for line in lines:
print line, parse_addr(line)
Output:
Boulder, Co {'city': 'Boulder', 'state': 'Colorado'}
80303 {'zip': '80303'}
Boulder, Colorado {'city': 'Boulder', 'state': 'Colorado'}
Boulder, Co 80303 {'city': 'Boulder', 'state': 'Colorado', 'zip': '80303'}
Handling of "South Dakota" and other two-word states/cities left as an exercise for the reader :)
As the other posters suggested, you can get smart and use the zip code to narrow in on the city/state as well.