views:

390

answers:

3

Phone number data in various formats (I've chosen these because the data coming in is unreliable and not in expected formats):

+1 480-874-4666
404-581-4000
(805) 682-4726
978-851-7321, Ext 2606
413- 658-1100
(513) 287-7000,Toll Free (800) 733-2077
1 (813) 274-8130
212-363-3200,Media Relations: 212-668-2251.
323/221-2164

My Ruby code to extract all the digits, remove any leading 1's for the USA country code, then use the first 10 digits to create the "new" phone number in a format desired:

  nums = phone_number_string.scan(/[0-9]+/)
  if nums.size > 0
    all_nums = nums.join
    all_nums = all_nums[0..0] == "1" ? all_nums[1..-1] : all_nums
    if all_nums.size >= 10
      ten_nums = all_nums[0..9]
      final_phone = "#{ten_nums[0..2]}-#{ten_nums[3..5]}-#{ten_nums[6..9]}"
    else
      final_phone = ""
    end
    puts "#{final_phone}"
  else
    puts "No number to fix."
  end

The results are very good!

480-874-4666
404-581-4000
805-682-4726
978-851-7321
413-658-1100
513-287-7000
813-274-8130
212-363-3200
323-221-2164

But, I think there's a better way. Can you refactor this to be more efficient, more legible, or more useful?

A: 

For numbers in the North American plan one could extract the first number using phone_number_string.gsub(/\D/, '').match(/^1?(\d{10})/)[1]

For example:

test_phone_numbers = ["+1 480-874-4666",
                      "404-581-4000",
                      "(805) 682-4726",
                      "978-851-7321, Ext 2606",
                      "413- 658-1100",
                      "(513) 287-7000,Toll Free (800) 733-2077",
                      "1 (813) 274-8130",
                      "212-363-3200,Media Relations: 212-668-2251.",
                      "323/221-2164",
                      "foobar"]

test_phone_numbers.each do | phone_number_string | 
  match = phone_number_string.gsub(/\D/, '').match(/^1?(\d{10})/)
  puts(
    if (match)
      "#{match[1][0..2]}-#{match[1][3..5]}-#{match[1][6..9]}"
    else
      "No number to fix."
    end
  )
end

As with the starting code, this doesn't capture multiple numbers, e.g., "(513) 287-7000,Toll Free (800) 733-2077"

FWIW, I've found it easier in the long run to capture and store complete numbers, i.e., including the country code and no separators; making guesses during the capture on which numbering plan ones missing a prefix are in and selecting formats, e.g., NANP v. DE, when rendering.

ylg
+7  A: 

Here's a much simpler approach using only regexes and substitution:

def extract_phone_number(input)
  if input.gsub(/\D/, "").match(/^1?(\d{3})(\d{3})(\d{4})/)
    [$1, $2, $3].join("-")
  end
end

This strips all non-digits (\D), skips an optional leading one (^1?), then extracts the first remaining 10 digits in chunks ((\d{3})(\d{3})(\d{4})) and formats.

Here's the test:

test_data = {
  "+1 480-874-4666"                             => "480-874-4666",
  "404-581-4000"                                => "404-581-4000",
  "(805) 682-4726"                              => "805-682-4726",
  "978-851-7321, Ext 2606"                      => "978-851-7321",
  "413- 658-1100"                               => "413-658-1100",
  "(513) 287-7000,Toll Free (800) 733-2077"     => "513-287-7000",
  "1 (813) 274-8130"                            => "813-274-8130",
  "212-363-3200,Media Relations: 212-668-2251." => "212-363-3200",
  "323/221-2164"                                => "323-221-2164",
  ""                                            => nil,
  "foobar"                                      => nil,
  "1234567"                                     => nil,
}

test_data.each do |input, expected_output|
  extracted = extract_phone_number(input)
  print "FAIL (expected #{expected_output}): " unless extracted == expected_output
  puts extracted
end
Ryan McGeary
This approach is probably faster.
Alex L
+1  A: 

My approach is a bit different (and I think better IMHO :-): I needed to not miss any phone numbers, even if there were 2 on a line. I also didn't want to get lines that had 3 sets of numbers that were far apart (see the cookies example), and I didn't want to mistake an IP address as a phone number.

Code to allow multiple numbers per line, but also require sets of digits to be 'close' to each other:

def extract_phone_number(input)
  result = input.scan(/(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/).map{|e| e.join('-')}
  # <result> is an Array of whatever phone numbers were extracted, and the remapping
  # takes care of cleaning up each number in the Array into a format of 800-432-1234
  result = result.join(' :: ')
  # <result> is now a String, with the numbers separated by ' :: '
  # ... or there is another way to do it (see text below the code) that only gets the
  # first phone number only.

  # Details of the Regular Expressions and what they're doing
  # 1. (\d{3}) -- get 3 digits (and keep them)
  # 2. \D{0,3} -- allow skipping of up to 3 non-digits. This handles hyphens, parentheses, periods, etc.
  # 3. (\d{3}) -- get 3 more digits (and keep them)
  # 4. \D{0,3} -- skip up to 0-3 non-digits
  # 5. (\d{4}) -- keep the final 4 digits

  result.empty? ? nil : result
end

And here are the tests (with a few additional tests)

test_data = {
  "DB=Sequel('postgres://user:[email protected]/test_test')" => nil, # DON'T MISTAKE IP ADDRESSES AS PHONE NUMBERS
  "100 cookies + 950 cookes = 1050 cookies"     => nil,  # THIS IS NEW
  "this 123 is a 456 bad number 7890"           => nil,  # THIS IS NEW
  "212-363-3200,Media Relations: 212-668-2251." => "212-363-3200 :: 212-668-2251", # THIS IS CHANGED
  "this is +1 480-874-4666"                     => "480-874-4666",
  "something 404-581-4000"                      => "404-581-4000",
  "other (805) 682-4726"                        => "805-682-4726",
  "978-851-7321, Ext 2606"                      => "978-851-7321",
  "413- 658-1100"                               => "413-658-1100",
  "(513) 287-7000,Toll Free (800) 733-2077"     => "513-287-7000 :: 800-733-2077", # THIS IS CHANGED
  "1 (813) 274-8130"                            => "813-274-8130",
  "323/221-2164"                                => "323-221-2164",
  ""                                            => nil,
  "foobar"                                      => nil,
  "1234567"                                     => nil,
}

def test_it(test_data)
  test_data.each do |input, expected_output|
    extracted = extract_phone_number(input)
    puts "#{extracted == expected_output ? 'good': 'BAD!'}   ::#{input} => #{extracted.inspect}"
  end
end

test_it(test_data)

An Alternate implementation: by using "scan" it will automatically apply the regular expression multiple times, which is good if you want to extra more than 1 phone number per line. If you just want to get the first phone number on a line then you could also use:

first_phone_number = begin 
  m = /(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/.match(input)
  [m[1],m[2],m[3]].join('-')
rescue nil; end

(just a different way of doing things, using the "match" function of RegExp)

Greg Edwards