My approach is a bit different (and I think better IMHO :-): I needed to not miss any phone numbers, even if there were 2 on a line. I also didn't want to get lines that had 3 sets of numbers that were far apart (see the cookies example), and I didn't want to mistake an IP address as a phone number.
Code to allow multiple numbers per line, but also require sets of digits to be 'close' to each other:
def extract_phone_number(input)
result = input.scan(/(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/).map{|e| e.join('-')}
# <result> is an Array of whatever phone numbers were extracted, and the remapping
# takes care of cleaning up each number in the Array into a format of 800-432-1234
result = result.join(' :: ')
# <result> is now a String, with the numbers separated by ' :: '
# ... or there is another way to do it (see text below the code) that only gets the
# first phone number only.
# Details of the Regular Expressions and what they're doing
# 1. (\d{3}) -- get 3 digits (and keep them)
# 2. \D{0,3} -- allow skipping of up to 3 non-digits. This handles hyphens, parentheses, periods, etc.
# 3. (\d{3}) -- get 3 more digits (and keep them)
# 4. \D{0,3} -- skip up to 0-3 non-digits
# 5. (\d{4}) -- keep the final 4 digits
result.empty? ? nil : result
end
And here are the tests (with a few additional tests)
test_data = {
"DB=Sequel('postgres://user:[email protected]/test_test')" => nil, # DON'T MISTAKE IP ADDRESSES AS PHONE NUMBERS
"100 cookies + 950 cookes = 1050 cookies" => nil, # THIS IS NEW
"this 123 is a 456 bad number 7890" => nil, # THIS IS NEW
"212-363-3200,Media Relations: 212-668-2251." => "212-363-3200 :: 212-668-2251", # THIS IS CHANGED
"this is +1 480-874-4666" => "480-874-4666",
"something 404-581-4000" => "404-581-4000",
"other (805) 682-4726" => "805-682-4726",
"978-851-7321, Ext 2606" => "978-851-7321",
"413- 658-1100" => "413-658-1100",
"(513) 287-7000,Toll Free (800) 733-2077" => "513-287-7000 :: 800-733-2077", # THIS IS CHANGED
"1 (813) 274-8130" => "813-274-8130",
"323/221-2164" => "323-221-2164",
"" => nil,
"foobar" => nil,
"1234567" => nil,
}
def test_it(test_data)
test_data.each do |input, expected_output|
extracted = extract_phone_number(input)
puts "#{extracted == expected_output ? 'good': 'BAD!'} ::#{input} => #{extracted.inspect}"
end
end
test_it(test_data)
An Alternate implementation: by using "scan" it will automatically apply the regular expression multiple times, which is good if you want to extra more than 1 phone number per line. If you just want to get the first phone number on a line then you could also use:
first_phone_number = begin
m = /(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/.match(input)
[m[1],m[2],m[3]].join('-')
rescue nil; end
(just a different way of doing things, using the "match" function of RegExp)