views:

29

answers:

0

I have a block of text that includes name, maybe company name, and address, and maybe email address. I want to extract the street address out of that, and preferably name and address.

This data is siphoned from multiple sources, so I have no idea about the actual formatting. It could be something like this

Company name, [email protected]
ATTN John Doe
care of Company Name
123 Street St
New York, NY 12345
US
123-456-7890

But any of those lines could be rearranged or missing (phone number could come first, no ATTN or c/o, etc). Also, this could be from any country.

The goal is to a) plug the address into the Google Maps API, and b) create a contact with as much information as possible.

Here is a random idea I had:

  1. Take any line with an email address (can be found with a regex easily), store the email address and remove the line from further consideration.
  2. Take any line with a phone number (digits only, and [-+()]), store that number, and remove the line from further consideration.
  3. Take the last three lines and consider those the street address - plug them into Google Maps and hope for the best.

Obviously, that's a lot of juju magic. Is there a smarter approach? Are there are any libraries that have good regexes to look for street addresses of different countries?