views:

3352

answers:

3

I'm looking for a routine that will format a string of numbers as a UK phone number. The routine should account for UK area codes that require different formatting (i.e. London compared to Edinburgh compared to Worcester) as well as mobile numbers.

My phone numbers are stored in the database as strings, containing only numeric characters.

So far I have come up with this, but the performance seems poor.

/// <summary>
/// Formats a string as a UK phone number
/// </summary>
/// <remarks>
/// 02012345678 becomes 020 1234 5678
/// 01311234567 becomes 0131 123 4567
/// 01905123456 becomes 01905 123456
/// 07816123456 becomes 07816 123456
/// </remarks>
public static string FormatPhoneNumber(string phoneNumber)
{
    string formattedPhoneNumber = null;

    if (!string.IsNullOrEmpty(phoneNumber))
    {
        System.Text.RegularExpressions.Regex area1 = new System.Text.RegularExpressions.Regex(@"^0[1-9]0");
        System.Text.RegularExpressions.Regex area2 = new System.Text.RegularExpressions.Regex(@"^01[1-9]1");

        string formatString;

        if (area1.Match(phoneNumber).Success)
        {
            formatString = "0{0:00 0000 0000}";
        }
        else if (area2.Match(phoneNumber).Success)
        {
            formatString = "0{0:000 000 0000}";
        }
        else
        {
            formatString = "0{0:0000 000000}";
        }

        formattedPhoneNumber = string.Format(formatString, Int64.Parse(phoneNumber));
    }

    return formattedPhoneNumber;
}

Thoughts welcomed on how to improve this...

Edit

My initial thoughts are that I should store phone numbers as numeric fields in the database, then I can go without the Int64.Parse and know that they are truly numeric.

Edit 2

The phone numbers will all be UK geographic or UK mobile numbers, so special cases like 0800 do not need to be considered

+2  A: 

UK telephone numbers vary in length from 8 digits to 12 digits, including the leading zero. "area" codes can vary between 2 and 4 digits.

All of the tables that show the area code and total length for each number prefix are available from OFCOM's website. NB: These tables are very long.

Also, there's no standard for exactly where spaces are put. Some people might put them in difference places depending on how "readable" it makes the resulting text.

Alnitak
+1 for the link to an *awesome* resource =)
David Thomas
A: 

I'd be tempted to use a tighter set of rules that only check the bear minimum; So on the assumption the leading zero is in the database, pseudo code would be:

if( phoneNumber.substring(1,1) == "2" )
{
    // 000 0000 0000
}
else if( phoneNumber.substring(1,1) == "1" && (phoneNumber.substring(1,1) == "2" || phoneNumber.substring(3,1) = "1") )
{
    // 0000 000 0000
}
else
{
    // 00000 000000
}

NB. your patterns are slightly wrong 023 is a three digit code, and 0800 is not

Rowland Shaw
except some people use '08000 abc def' instead of '0800 0abc def'
Alnitak
Indeed, could be added as a special case. Should probably note that shorter numbers are possible in the general case as well.
Rowland Shaw
A: 

** I'm looking for a routine that will format a string of numbers as a UK phone number. **

You could download the Ofcom database that lists the formats for each number range, including national dialling only numbers, and do a lookup for each number you need to format. The database lists the SABCDE digits and the format: 0+10, 2+8, 3+7, 4+6, 4+5, 5+5, or 5+4 for each range.

There are a small number of errors in the database (especially for 01697 and 0169 77 codes), but they number less than ten errors in more than a quarter of a million entries.

There are four files covering 01 and 02 numbers, and separate files for various non-geographic number ranges.

0+10 numbers are 'National Dialling Only' and are written without parentheses around the area code part. The area code will be 02x for all 02 numbers, 01xx for all 011x and 01x1 numbers, and 01xxx for most other 01 numbers (a very small number - about a dozen - will be 01xx xx though).

Parentheses surround the area code on all other 01 and 02 numbers (that is, use parentheses on 01 and 02 numbers where the local number part does not begin with a 0 or a 1). Parentheses show that local dialling is possible within the same area by omitting the digits enclosed by the parentheses.

The 2+8 nomenclature shows the area code and local number length, with the entry 2075 : 2+8 meaning the number is formatted as (020) 75xx xxxx. Remember the leading zero is not 'counted' in the 2+8 determination.

** UK telephone numbers vary in length from 8 digits to 12 digits **

No. Since 2000, most have 10 digits after the '0' trunk code. A few still have 9 digits after the '0' trunk code.

There are also a few special numbers such as 0800 1111 and 0845 4647 to consider.

** "area" codes can vary between 2 and 4 digits. **

Area codes can vary between 2 and 5 digits (the leading zero is not counted). To be clear, '020' is classed as a 2-digit area code because the leading 0 is actually the trunk code. There are also 011x and 01x1 area codes, and most numbers others have 01xxx area codes. The latter may have local numbers that are only 5 digits long instead of the more widely found 6 digit local numbers. A very small number have an 01xx xx area code and these have 5 or 4 digit local numbers.

** Also, there's no standard for exactly where spaces are put. **

There is always a space between the area code part and the local number part for all 01 and 02 numbers.

It is also traditional for (01xx xx) area codes to have a space within the area code as shown. This represents the old local exchange groupings where this system is still in use. Other (shorter) area codes are not split.

Local numbers with 7 or 8 digits have a split before the fourth digit from the end. Local numbers with 4, 5, or 6 digits are not split. This applies to geographic and non-geographic numbers alike.

For most 03, 08, and 09 numbers, the number is written as 0xxx xxx xxxx.

Some 0800 and all 0500 numbers are written 0xxx xxxxxx.

For 055, 056, and 070 numbers the number is written 0xx xxxx xxxx.

For mobile and pager numbers, use 07xxx xxxxxx.

** except some people use '08000 abc def' instead of '0800 0abc def' **

That usage is incorrect. Do be aware that some 0800 numbers have 9 digits after the 0 trunk code, whilst others have 10 digits after the 0 trunk code.

So, both 0800 xxxxxx and 0800 xxx xxxx are correct.

0500 numbers use only 0500 xxxxxx.

Most 03, 08, and 09 numbers are written written as 0xxx xxx xxxx.

See also: http://en.wikipedia.org/wiki/Local%5Fconventions%5Ffor%5Fwriting%5Ftelephone%5Fnumbers#United%5FKingdom

nrs