views:

281

answers:

5

Ok, so I've already got the bulk of this worked out... I've got a function right now that will run a pattern check on the phone number it's given and then determine whether it's a "valid" phone number based on the NANPA guidelines.

However, the problem I'm running into is that it allows people to enter "extension" numbers, but I can't figure out how to allow them in multiple formats.

For example:

It would take the following phone number: (123) 456-7890 x345 and mark it as valid... However, if I were to try using: (123) 456-7890 ext345 then it marks it as invalid.

The regex pattern I'm using for the initial check is one I found on the web, and I've done barely any modifications to it thusfar... I've made it so that it allows "." as a separator between numbers, but that's it.

Here's the function: (long winded, I know)

/*
* Function to validate a US phone number and split it into it's 3 components
* 3-digit area code, 3-digit exchange code, 4-digit subscriber number
*/
function validPhone($phone) {

  //Set the regex pattern
  $pattern = '/^[\(]?(\d{0,3})[\)]?[\.]?[\/]?[\s]?[\-]?(\d{3})[\s]?[\.]?[\/]?[\-]?(\d{4})[\s]?[x]?(\d*)$/';

  //Set variable to false
  $valid = array(
    'ac'=>false,
    'ec'=>false,
    'sn'=>false,
    'en'=>false,
    'all'=>false,
  );

  //Look for match, then dump patterns to $matches
  if (preg_match($pattern, $phone, $matches)) {

    // Original number
    $phone_number = $matches[0];

    // 3-digit area code
    $area_code = $matches[1];

    // Validate area code based on NANPA standards
    if(ereg('^[2-9]{1}[0-8]{1}[0-9]{1}$', $area_code)) {
      if($area_code != '555') {
        $valid['ac'] = true;
      }
    }

    // 3-digit exchange code
    $exchange_code = $matches[2];

    // Validate exchange code based on NANPA standards
    if(ereg('^[2-9]{1}[0-9]{2}$', $exchange_code)) {
      $valid['ec'] = true;
    }

    // 4-digit subscriber number
    $sub_number = $matches[3];

    // Double check that subscriber number is 0-9 only
    if(ereg('^[0-9]{4}$', $sub_number)) {
      $valid['sn'] = true;
    }

    // Extension number (if entered)
    $ext_number = $matches[4];

    // Double check that extension is 0-9 only
    if(!empty($ext_number)) {
      if(ereg('^[0-9]*$', $ext_number)) {
        $valid['en'] = true;
      }
    }


    echo '<h1>Parsing phone number: '.$phone_number.'</h1>';
    echo '<h4>Area Code: '.$area_code.'</h4>';
    echo '<h4>Exchange Code: '.$exchange_code.'</h4>';
    echo '<h4>Subscriber Number: '.$sub_number.'</h4>';
    if(!empty($ext_number)) {
      echo '<h4>Extension: '.$ext_number.'</h4>';
    }
    else {
      echo '<h4>No Extension Found</h4>';
    }
    echo '<hr />';

    // Check that all are valid 
    // before setting final variable
    if($valid['ac']) {
      echo '<h5>Step 1: Area code is valid</h5>';
      if($valid['ec']) {
        echo '<h5>Step 2: Extension code is valid</h5>';
        if($valid['sn']) {
          echo '<h5>Step 3: Subscriber number is valid</h5>';
          if(!empty($ext_number) && $valid['en']) {
            echo '<h5>Step 4: Extension number is valid</h5>';
            $valid['all'] = true;
          }
          elseif(empty($ext_number)) {
            echo '<h5>Step 4: Extension number not set, continuing</h5>';
            $valid['all'] = true;
          }
        }
      }
    }
  }
  return $valid['all'];
}

I know it's not setup for efficiency right now, I have echoes everywhere for debugging...

Here's a few examples of formats it will and won't validate:

123-456-7890x123 = validates
123.456.7890x123 = validates
123 456 7890 x123 = validates
(123) 456-7890 x123 = validates

123-456-7890ex123 = doesn't validate
123.456.7890 ex123 = doesn't validate
123 456 7890 ext123 = doesn't validate

Basically, the only way it validates when extensions are entered is if the extension is in the format: x123 (with or without a leading space)

Any other format such as ex, ext, ext:, ex: etc... None of them work.

I know it has to be the original regex pattern, but I can't figure out how to solve it.

Any ideas?


EDIT: I've mixed and modified two of the answers given below to form the full function which now does what I had wanted and then some... So I figured I'd post it here in case anyone else comes looking for this same thing.

/*
 * Function to analyze string against many popular formatting styles of phone numbers
 * Also breaks phone number into it's respective components
 * 3-digit area code, 3-digit exchange code, 4-digit subscriber number
 * After which it validates the 10 digit US number against NANPA guidelines
*/
function validPhone($phone) {

  $format_pattern = '/^(?:(?:\((?=\d{3}\)))?(\d{3})(?:(?<=\(\d{3})\))?[\s.\/-]?)?(\d{3})[\s\.\/-]?(\d{4})\s?(?:(?:(?:(?:e|x|ex|ext)\.?\:?|extension\:?)\s?)(?=\d+)(\d+))?$/';
  $nanpa_pattern = '/^(?:1)?(?(?!(37|96))[2-9][0-8][0-9](?<!(11)))?[2-9][0-9]{2}(?<!(11))[0-9]{4}(?<!(555(01([0-9][0-9])|1212)))$/';

  //Set array of variables to false initially
  $valid = array(
    'format' => false,
    'nanpa' => false,
    'ext' => false,
    'all' => false
  );

  //Check data against the format analyzer
  if(preg_match($format_pattern, $phone, $matchset)) {
    $valid['format'] = true;    
  }

  //If formatted properly, continue
  if($valid['format']) {

    //Set array of new components
    $components = array(
      'ac' => $matchset[1], //area code
      'xc' => $matchset[2], //exchange code
      'sn' => $matchset[3], //subscriber number
      'xn' => $matchset[4], //extension number
    );

    //Set array of number variants
    $numbers = array(
      'original' => $matchset[0],
      'stripped' => substr(preg_replace('[\D]', '', $matchset[0]), 0, 10)
    );

    //Now let's check the first ten digits against NANPA standards
    if(preg_match($nanpa_pattern, $numbers['stripped'])) {
      $valid['nanpa'] = true;
    }

    //If the NANPA guidelines have been met, continue
    if($valid['nanpa']) {
      if(!empty($components['xn'])) {
        if(preg_match('/^[\d]{1,6}$/', $components['xn'])) {
          $valid['ext'] = true;
        }
      }
      else {
        $valid['ext'] = true;
      }
    }

    //If the extension number is valid or non-existent, continue
    if($valid['ext']) {
      $valid['all'] = true;
    }
  }
  return $valid['all'];
}
+2  A: 

Why not convert any series of letters to be "x". Then that way you would have all possibilities converted to be "x".

OR

Check for 3digits, 3digits, 4digits, 1orMoreDigits and disregard any other characters inbetween

Regex: ([0-9]{3}).*?([0-9]{3}).*?([0-9]{4}).+?([0-9]{1,})

webdestroya
Not sure I follow what you mean...
Josh
Rather than looking for "[x]" or "x" then digits you just look for "[\w\s]*\d+" . So you need not worry if they do phone number X extension or phone number EXTENSION extension or phone number EXT extension.
Freiheit
@Josh - something like `([0-9]{3}).*?([0-9]{3}).*?([0-9]{4}).+?([0-9]{1,})` This would disregard the separators and just validate that they in fact entered a valid number
webdestroya
That expression matches a whole lot of things he doesn't want to match, like:My address is 123 Main Street, Cityville, ST. I really like PI. It's my favorite number. It is 3.1415926535.Also, I'm not sure .+? is a valid sequence (adjacent repetition quantifiers), but if you're looking to make the . optional, a simple .* will suffice.
ebynum
@ebynum - `.+?` means one or more.. but the lowest amount that match (it is a lazy quantifier)
webdestroya
Touché. I always forget those lazy quantifiers.
ebynum
A: 

Well, you could modify the regex, but it won't be very nice -- should you allow "extn"? How about "extentn"? How about "and then you have to dial"?

I think the "right" way to do this is to add a separate, numerical, extension form box.

But if you really want the regex, I think I've fixed it up. Hint: you don't need [x] for a single character, x will do.

/^\(?(\d{0,3})\)?(\.|\/)|\s|\-)?(\d{3})(\.|\/)|\s|\-)?(\d{4})\s?(x|ext)?(\d*)$/

You allowed a dot, a slash, a dash, and a whitespace character. You should allow only one of these options. You'll need to update the references to $matches; the useful groups are now 0, 2, and 4.

P.S. This is untested, since I don't have a reference implentation of PHP running. Apologies for mistakes, please let me know if you find any and I'll try to fix them.

Edit

This is summed up much better than I can here.

katrielalex
haha, I understand where you're coming from... It could turn into a neverending battle... However, I think that for simplicity's sake I'd rather have the phone number done all in one shot.I guess I could break off the extension as a last resort though.
Josh
The regex at the moment allows `x` or `ext`. If you want to add other options, you can put them in by changing `(x|ext)` to `(x|ext|spam|ham|eggs)`. However, I think if a user actually wants to put in an extension, this will cause no end of confusion.
katrielalex
+2  A: 

The current REGEX

/^[\(]?(\d{0,3})[\)]?[\.]?[\/]?[\s]?[\-]?(\d{3})[\s]?[\.]?[\/]?[\-]?(\d{4})[\s]?[x]?(\d*)$/

has a lot of issues, resulting in it matching all of the following, among others:
(0./ -000 ./-0000 x00000000000000000000000)
()./1234567890123456789012345678901234567890
\)\-555/1212 x

I think this REGEX is closer to what you're looking for:

/^(?:(?:(?:1[.\/\s-]?)(?!\())?(?:\((?=\d{3}\)))?((?(?!(37|96))[2-9][0-8][0-9](?<!(11)))?[2-9])(?:\((?<=\(\d{3}))?)?[.\/\s-]?([0-9]{2}(?<!(11)))[.\/\s-]?([0-9]{4}(?<!(555(01([0-9][0-9])|1212))))(?:[\s]*(?:(?:x|ext|extn|ex)[.:]*|extension[:]?)?[\s]*(\d+))?$/

or, exploded:

<?
    $pattern = 
    '/^                                                     #  Matches from beginning of string

        (?:                                                 #  Country / Area Code Wrapper [not captured]
            (?:                                             #  Country Code Wrapper [not captured]
                (?:                                         #  Country Code Inner Wrapper [not captured]
                    1                                       #  1 - CC for United States and Canada
                    [.\/\s-]?                               #  Character Class ('.', '/', '-' or whitespace) for allowed (optional, single) delimiter between Country Code and Area Code
                )                                           #  End of Country Code
                (?!\()                                      #  Lookahead, only allowed if not followed by an open parenthesis
            )?                                              #  Country Code Optional
            (?:                                             #  Opening Parenthesis Wrapper [not captured]
                \(                                          #  Opening parenthesis
                (?=\d{3}\))                                 #  Lookahead, only allowed if followed by 3 digits and closing parenthesis [lookahead never captured]
            )?                                              #  Parentheses Optional
            ((?(?!(37|96))[2-9][0-8][0-9](?<!(11)))?[2-9])  #  3-digit NANPA-valid Area Code [captured]
            (?:                                             #  Closing Parenthesis Wrapper [not captured]
                \(                                          #  Closing parenthesis
                (?<=\(\d{3})                                #  Lookbehind, only allowed if preceded by 3 digits and opening parenthesis [lookbehind never captured]
            )?                                              #  Parentheses Optional
        )?                                                  #  Country / Area Code Optional

        [.\/\s-]?                                           #  Character Class ('.', '/', '-' or whitespace) for allowed (optional, single) delimiter between Area Code and Central-office Code

        ([0-9]{2}(?<!(11)))                                 #  3-digit NANPA-valid Central-office Code [captured]

        [.\/\s-]?                                           #  Character Class ('.', '/', '-' or whitespace) for allowed (optional, single) delimiter between Central-office Code and Subscriber number

        ([0-9]{4}(?<!(555(01([0-9][0-9])|1212))))           #  4-digit NANPA-valid Subscriber Number [captured]

        (?:                                                 #  Extension Wrapper [not captured]
            [\s]*                                           #  Character Class for allowed delimiters (optional, multiple) between phone number and extension
            (?:                                             #  Wrapper for extension description text [not captured]
                (?:x|ext|extn|ex)[.:]*                      #  Abbreviated extensions with character class for terminator (optional, multiple) [not captured]
              |                                             #  OR
                extension[:]?                               #  The entire word extension with character class for optional terminator
            )?                                              #  Marker for Extension optional
            [\s]*                                           #  Character Class for allowed delimiters (optional, multiple) between extension description text and actual extension
            (\d+)                                           #  Extension [captured if present], required for extension wrapper to match
        )?                                                  #  Entire extension optional

    $                                                       #  Matches to end of string
    /x';                                                    // /x modifier allows the expanded and commented regex

?>

This modification provides several improvements.

  1. It creates a configurable group of items that can match as the extension. You can add additional delimiters for the extension. This was the original request. The extension also allows for a colon after the extension delimter.
  2. It converts the sequence of 4 optional delimiters (dot, whitespace, slash or hyphen) into a character class that matches only a single one.
  3. It groups items appropriately. In the given example, you can have the opening parentheses without an area code between them, and you can have the extension mark (space-x) without an extension. This alternate regular expression requires either a complete area code or none and either a complete extension or none.
  4. The 4 components of the number (area code, central office code, phone number and extension) are the back-referenced elements that feed into $matches in preg_match().
  5. Uses lookahead/lookbehind to require matched parentheses in the area code.
  6. Allows for a 1- to be used before the number. (This assumes that all numbers are US or Canada numbers, which seems reasonable since the match is ultimately made against NANPA restrictions. Also disallows mixture of country code prefix and area code wrapped in parentheses.
  7. It merges in the NANPA rules to eliminate non-assignable telephone numbers.
    1. It eliminates area codes in the form 0xx, 1xx 37x, 96x, x9x and x11 which are invalid NANPA area codes.
    2. It eliminates central office codes in the form 0xx and 1xx (invalid NANPA central office codes).
    3. It eliminates numbers with the form 555-01xx (non-assignable from NANPA).

It has a few minor limitations. They're probably unimportant, but are being noted here.

  1. There is nothing in place to require that the same delimiter is used repeatedly, allowing for numbers like 800-555.1212, 800/555 1212, 800 555.1212 etc.
  2. There is nothing in place to restrict the delimiter after an area code with parentheses, allowing for numbers like (800)-555-1212 or (800)/5551212.

The NANPA rules are adapted from the following REGEX, found here: http://blogchuck.com/2010/01/php-regex-for-validating-phone-numbers/

/^(?:1)?(?(?!(37|96))[2-9][0-8][0-9](?<!(11)))?[2-9][0-9]{2}(?<!(11))[0-9]{4}(?<!(555(01([0-9][0-9])|1212)))$/
ebynum
Wow... I hadn't even thought about the flaws you pointed out. Thank you very much for letting me know!I also hadn't thought about trying to run it against two separate patterns in order to first validate the format and THEN validate the actual number...I'll give this a shot and see what happens.Thanks!
Josh
Hmm, when I try to run a test number through the first pattern, I get this: `Warning: preg_match() [function.preg-match]: Unknown modifier '\'`
Josh
Fixed a bug that probably resolved that, escaping / in the character classes.
ebynum
I've used a combination of yours and enobrev's answers to form the complete function... I've used enobrev's to check the format itself, then yours to validate the number string (after sanitized) against NANPA guidelines. Thanks!
Josh
+3  A: 

Alternatively, you could use some pretty simple and straightforward JavaScript to force the user to enter in a much more specified format. The Masked Input Plugin ( http://digitalbush.com/projects/masked-input-plugin/ ) for jQuery allows you to mask an HTML input as a telephone number, only allowing the person to enter a number in the format xxx-xxx-xxxx. It doesn't solve your extension issues, but it does provide for a much cleaner user experience.

ebynum
+1 The divide and conquer strategy is more maintainable.
banzaimonkey
Note: You do still need to confirm that the input is in the form you have required (in my example, xxx-xxx-xxxx). Otherwise, people without JavaScript (or who are circumventing it) could still insert invalid content.
ebynum
I like this solution. Similarly, you could go a more lo-tech route and just have 3 or 4 textboxes: area code, number (1 or 2 boxes), extension.
Lèse majesté
+4  A: 

You can resolve this using a lookahead assertion. Basically what we're saying is I want a series of specific letters, (e, ex, ext, x, extension) followed by one or more number. But we also want to cover the case where there's no extension at all.

Side Note, you don't need brackets around single characters like [\s] or that [x] that follows. Also, you can group characters that are meant to be in the same spot, so instead of \s?\.?\/?, you can use [\s\.\/]? which means "one of any of those characters"

Here's an update with regex that resolves your comment here as well. I've added the explanation in the actual code.

<?php
    $sPattern = "/^
        (?:                                 # Area Code
            (?:                            
                \(                          # Open Parentheses
                (?=\d{3}\))                 # Lookahead.  Only if we have 3 digits and a closing parentheses
            )?
            (\d{3})                         # 3 Digit area code
            (?:
                (?<=\(\d{3})                # Closing Parentheses.  Lookbehind.
                \)                          # Only if we have an open parentheses and 3 digits
            )?
            [\s.\/-]?                       # Optional Space Delimeter
        )?
        (\d{3})                             # 3 Digits
        [\s\.\/-]?                          # Optional Space Delimeter
        (\d{4})\s?                          # 4 Digits and an Optional following Space
        (?:                                 # Extension
            (?:                             # Lets look for some variation of 'extension'
                (?:
                    (?:e|x|ex|ext)\.?       # First, abbreviations, with an optional following period
                |
                    extension               # Now just the whole word
                )
                \s?                         # Optionsal Following Space
            )
            (?=\d+)                         # This is the Lookahead.  Only accept that previous section IF it's followed by some digits.
            (\d+)                           # Now grab the actual digits (the lookahead doesn't grab them)
        )?                                  # The Extension is Optional
        $/x";                               // /x modifier allows the expanded and commented regex

    $aNumbers = array(
        '123-456-7890x123',
        '123.456.7890x123',
        '123 456 7890 x123',
        '(123) 456-7890 x123',
        '123.456.7890x.123',
        '123.456.7890 ext. 123',
        '123.456.7890 extension 123456',
        '123 456 7890', 
        '123-456-7890ex123',
        '123.456.7890 ex123',
        '123 456 7890 ext123',
        '456-7890',
        '456 7890',
        '456 7890 x123',
        '1234567890',
        '() 456 7890'
    );

    foreach($aNumbers as $sNumber) {
        if (preg_match($sPattern, $sNumber, $aMatches)) {
            echo 'Matched ' . $sNumber . "\n";
            print_r($aMatches);
        } else {
            echo 'Failed ' . $sNumber . "\n";
        }
    }
?>

And The Output:

Matched 123-456-7890x123
Array
(
    [0] => 123-456-7890x123
    [1] => 123
    [2] => 456
    [3] => 7890
    [4] => 123
)
Matched 123.456.7890x123
Array
(
    [0] => 123.456.7890x123
    [1] => 123
    [2] => 456
    [3] => 7890
    [4] => 123
)
Matched 123 456 7890 x123
Array
(
    [0] => 123 456 7890 x123
    [1] => 123
    [2] => 456
    [3] => 7890
    [4] => 123
)
Matched (123) 456-7890 x123
Array
(
    [0] => (123) 456-7890 x123
    [1] => 123
    [2] => 456
    [3] => 7890
    [4] => 123
)
Matched 123.456.7890x.123
Array
(
    [0] => 123.456.7890x.123
    [1] => 123
    [2] => 456
    [3] => 7890
    [4] => 123
)
Matched 123.456.7890 ext. 123
Array
(
    [0] => 123.456.7890 ext. 123
    [1] => 123
    [2] => 456
    [3] => 7890
    [4] => 123
)
Matched 123.456.7890 extension 123456
Array
(
    [0] => 123.456.7890 extension 123456
    [1] => 123
    [2] => 456
    [3] => 7890
    [4] => 123456
)
Matched 123 456 7890
Array
(
    [0] => 123 456 7890
    [1] => 123
    [2] => 456
    [3] => 7890
)
Matched 123-456-7890ex123
Array
(
    [0] => 123-456-7890ex123
    [1] => 123
    [2] => 456
    [3] => 7890
    [4] => 123
)
Matched 123.456.7890 ex123
Array
(
    [0] => 123.456.7890 ex123
    [1] => 123
    [2] => 456
    [3] => 7890
    [4] => 123
)
Matched 123 456 7890 ext123
Array
(
    [0] => 123 456 7890 ext123
    [1] => 123
    [2] => 456
    [3] => 7890
    [4] => 123
)
Matched 456-7890
Array
(
    [0] => 456-7890
    [1] => 
    [2] => 456
    [3] => 7890
)
Matched 456 7890
Array
(
    [0] => 456 7890
    [1] => 
    [2] => 456
    [3] => 7890
)
Matched 456 7890 x123
Array
(
    [0] => 456 7890 x123
    [1] => 
    [2] => 456
    [3] => 7890
    [4] => 123
)
Matched 1234567890
Array
(
    [0] => 1234567890
    [1] => 123
    [2] => 456
    [3] => 7890
)
Failed () 456 7890
enobrev
I like this one as it's a little easier for me to understand... However, as ebynum pointed out below about my original regex pattern, this one also allow for matches when the parentheses around the area code don't contain any value...Like so: () 456-7890 ext1234 = match Whereas it should fail because the area code identifier was put in place, but not filled.However, I guess as far as the "original request" of this question goes, your regex pattern pretty much has that part covered.
Josh
Ok Josh, fixed in the update.
enobrev
I've used a combination of yours and ebynum's answers to form the complete function...I've used yours to check the format itself, then ebynum's to validate the number string (after sanitized) against NANPA guidelines.Thanks!
Josh
This regex currently allows a backslash as the delimiter like this (xxx)xxx\xxxx. It uses redundant lookahead (performs a lookahead search to see if there's an extension and then requires the extension).
ebynum
@ebynum You're right. I was basing my pattern on the OP's. Not sure what you mean about the redundant lookahead. It's essentially looking for both the word/abbreviation AND the numbers, which is intentional as it seems incorrect to have a phone number like xxx-xxx-xxxx-xxx.
enobrev
I disagree with your preference (but it's just personal preference). I am totally fine with a number like 1-800-555-1212 1234. However, your regex still says this: ... allow a word/abbr for the extension, but only if it's followed by one or more numbers, and it has to be followed by one or more numbers. If you take out the lookahead, leaving: ((?:(?:(?:e|x|ex|ext)\.?|extension)\s?)(\d+))?, it still has to have both the word and the numbers (or nothing at all).
ebynum
Ah, OK, I see what you're saying. Thanks for the clarification.
enobrev