Given that the number of digits in the extension can be different for each company and the number of digits in the number could be different for each country and area code, this is a tricky problem to do efficiently.
Even if you get the data table split into base number and extension, you still have to split the incoming number into base number and extension, which I actually think complicates things.
What I would be inclined to try is:
Original format
- Try to match the incoming number with the database.
- If it matches one record, you have your answer - a specific person.
- If it matches more than one record, something has gone wrong, so fail.
- Otherwise, you have to find the company:
- Strip off the trailing digit from the incoming number and try to match this with the database again.
- If the number of digits drops below a threshold (probably 6 digits) then your search should probably fail. This is just to limit the number of database searches performed when the number isn't going to be found.
- If it matches no records, then you need to try this step again.
- If it matches more than one record, something has gone wrong, so fail.
- If it matches exactly one record, you have your next best answer - the company.
For example, searching for "+43123456777":
- +43123456777 matches 0 entries.
- +4312345677 matches 0 entries.
- +431234567 matches 1 entry: "Company A"
The main failure mode of this approach is if a company has variable length extension numbers. For instance consider what happens if both 431234567890 and 43123456789 are valid numbers but only the second one is in the database. If the incoming number is 431234567890, then 43123456789 will be matched in error.
Split format
This is a little more complex, but more robust.
- Try to match the incoming number with the database.
- If it matches one record, you have your answer - the company.
- If it matches more than one record, match the entry without an extension and you have found the company.
- Otherwise, you have to find the base company number and extension:
- Strip off the trailing digit from the incoming number and try to match this with the database again.
- If the number of digits drops below a threshold (probably 6 digits) then your search should probably fail. This is just to limit the number of database searches performed when the number isn't going to be found.
- If it matches no records, then you need to try this step again.
- If it matches one record, then you have found your answer - the company.
- If it matches more than one record, then you have found the base number of the company and thus now know the extension, so can try to look up the specific person:
- Strip the base number from the start of the original incoming number and use this to search the extensions of the records with that base number.
- If it matches exactly one record, you have found a specific person.
- If it doesn't match a specific person, match the entry without an extension and you have found the company.
For example, searching for "+43123456777":
- +43123456777 matches 0 entries.
- +4312345677 matches 0 entries.
- +431234567 matches 2 entries: "empty:Company A" & "890:employee in company A"
- Within these two matches "77" matches nothing, so return the empty extension: "Company A".
Implementation notes
This algorithm, as noted above, does have some efficiency problems. If the database lookup is expensive, it has a linear cost related to the length of the telephone number, especially in the case where no similar numbers exist in the database (for example, if the incoming number is from Kazakhstan, but there are no Kazakhstan numbers in the datsbase *8').
You could add some optimisations relatively easily though. If most of the companies you deal with use 3 or 4 digit extensions, you could start by stripping, say, 4 digits off the end and then doing a binary chop, until you reach an answer. This would reduce a 15 digit number to 4 or 5 in many cases and at most 6 lookups.
Also, every time you narrow the selection, you could select only within the previous selection rather than having to select within the whole database.
Additional implementation notes
Having finally worked out how Unreason's answer works, I can see that is a much simpler, more elegant solution. I wish I'd though of the simplicity of simply looking for the database number in the incoming number rather than the other way around.
My only concern is that performing this on every telephonenumber
in the database might impose excessive demands on the server. I would suggest benchmarking that solution under maximum stress and see if it causes problems. If not, fine - use that. If it does, consider implementing the simple form of my algorithm and doing the stress tests again. If the performance is still too low, try my binary search suggestion.