views:

178

answers:

3

Hey, folks. I'm looking for some regular expressions to help grab street addresses and phone numbers from free-form text (a la Gmail).

Given some text: "John, I went to the store today, and it was awesome! Did you hear that they moved to 500 Green St.? ... Give me a call at +14252425424 when you get a chance."

I'd like to be able to pull out:

500 Green St. (recognized as a street address)

+14252425424 (recognized as a phone number)

What makes this problem easier is that I don't care about parsing text that gets pulled out. That is, I don't care that Green is the name of the road or that 425 is the area code. I just want to grab strings that "look like" addresses or telephone numbers.

Unfortunately, this needs to work internationally, as best as possible.

Anyone have any leads? Thanks!

A: 

Take a look at Chapter 7 of Dive Into Python. It touches both phone numbers and street addresses. I believe you can use this as a starting point. The international part seems tough. I suggest you build a first draft, try it on several locales, iterate and improve.

Yuval F
Ah, but I imagine this problem is already solved. Do you know of any already-existing regular expressions that I may employ?Thanks.
spitzanator
Well, you can check http://regexlib.com/. It's the #1 source of regex solutions for problems that shouldn't be solved with regexes. ;)
Alan Moore
Alan, this looks like a great resource, thanks. Cursory search gave me several international phone number regexes; No international street address ones, though. I still believe this is hard.
Yuval F
A: 

Phone numbers as long as you have a list of all country codes and number formats is easy, street addresses I have no idea, the only advice I can give you is to validate each set of words @ addressdoctor.com

Alix Axel
A: 

You can give RecogniContact (-> address-parser.com) a try, it recognizes both postal addresses and phone numbers.

Mike Warner