ansaurus

Question

Fastest way to match telephony prefixes using asterisk PHP script

Answer 1

A:

I do it by using a hashtable of string, destination where the keys are strings that represent the prefix of the destination. The critical factor is that the hashtable must be sorted so that the longest strings are checked first. As soon as a matching prefix is found the call destination is known.

I actually also have a round of regular expressions that for more convoluted destinations and check the regular expressions before the destination prefixes.

I haven't measured how long it takes to get a match but I suspect 15ms max. The whole process of checking the desitnation and then the user's balance and finally setting a call time limit takes around 150ms. In my case I am using FastAGI and C# Windows service. As long as you take less than 500ms it will be impercetible to your users.

sipwiz 2009-09-28 21:44:49

Answer 2

A:

I also run a telephony application... what I have done is provided an internal REST API to call, this is what will cache known phone numbers and do all of the prefix checking.

Also I assume that you are looking for country codes as well. There are only a few overlapping country codes with the NANP. So I look first for a NANP, and do a quick match on the number of following numbers (7) to make sure, otherwise I fall back on to a country code. I then have a a rough idea of how many numbers in a telephone number each country is supposed to have through a regular expression.

I'm using an XML document and matching on XPath, then caching the XPath result when possible.

The cool thing about using a REST API as well, is that it can be used to clean up numbers before I store them in the DB for billing.

It's not an exact science but it seems to work.

null 2009-09-28 21:55:42

REST will use the TCP/IP stack, and will be at least an order of magnitude slower than accessing memory. It is a convenient method but I don't see it working here

Alex 2009-09-29 18:15:58

Answer 3

+1 A:

The way i see it, using a simple array structure should work ok...

Sample code: (note that for performance the prefixes are the keys in the array, not values)

// $prefixes = array(3=>1, 30=>1, 304=>1,305=>1,3056=>1,306=>1,31=>1, 40=>1);

function matchNumber($number)
{
  $prefixes = getPrefixesFromCache();

  $number = "$number";
  // try to find the longest prefix matching $number
  while ($number != '') {
    if (isset($keys[$number]))
      break;
    // not found yet, subtract last digit
    $number = substr($number, 0, -1);
  }
  return $number;
}

Another way would be to query the cache directly for the number - in this case, it could be further optimized:

split number string in 2.
query that string in the cache.
if it doesn't exist, goto 1
while it exists, store that value as result, and add another digit.

Snippet: (note that query_cache_for() should be replaced by whatever function your caching mechanism uses)

function matchNumber($number)
{
  $temp = "$number";
  $found = false;
  while (1) {
    $temp = substr($temp, 0, ceil(strlen($temp)/2) );
    $found = query_cache_for($temp);
    if ($found)
      break;
    if (strlen($temp) == 1)
      return FALSE; // should not happen!
  }
  while ($found) {
    $result = $temp;
    // add another digit
    $temp .= substr($number, strlen($temp), 1);
    $found = query_cache_for($temp);
  }
  return $result;
}

This approach has the obvious advantage that each prefix is a single element in the cache - the key could be 'asterix_prefix_<number>' for example, the value is unimportant (1 works).

jcinacio 2009-09-28 22:32:52

Thanks, I hadn't thought about placing all the prefixes directly in a hash array, it could work. I don't know how well PHP performs with very large hash tables (this one will contain at least 6000 elements) but I'll benchmark it and let you guys know ;)

Alex 2009-09-29 09:20:18

Answer 4

A:

Finding the longest common subsequence is a classical application of dynamic programming. The solution is O(n). http://en.wikipedia.org/wiki/Longest%5Fcommon%5Fsubsequence%5Fproblem

Alex 2009-09-28 22:52:49

Answer 5

+1 A:

Here is some sample code for an N-ary tree structure;

class PrefixCache {
 const EOS = 'eos';
 protected $data;

 function __construct() {
  $this->data = array();
  $this->data[self::EOS] = false;
 }

 function addPrefix($str) {
  $str = (string) $str;
  $len = strlen($str);

  for ($i=0, $t =& $this->data; $i<$len; ++$i) {
   $ch = $str[$i];

   if (!isset($t[$ch])) {
    $t[$ch] = array();
    $t[$ch][self::EOS] = false;
   }

   $t =& $t[$ch];
  }

  $t[self::EOS] = true;
 }

 function matchPrefix($str) {
  $str = (string) $str;
  $len = strlen($str);

  $so_far = '';
  $best = '';

  for ($i=0, $t =& $this->data; $i<$len; ++$i) {
   $ch = $str[$i];

   if (!isset($t[$ch]))
    return $best;
   else {
    $so_far .= $ch;
    if ($t[$ch][self::EOS])
     $best = $so_far;

    $t =& $t[$ch];     
   }
  }

  return false; // string not long enough - potential longer matches remain
 }

 function dump() {
  print_r($this->data);
 }
}

this can then be called as

$pre = new PrefixCache();

$pre->addPrefix('304');
$pre->addPrefix('305');
// $pre->addPrefix('3056');
$pre->addPrefix('3057');

echo $pre->matchPrefix('30561234567');

which performs as required (returns 305; if 3056 is uncommented, returns 3056 instead).

Note that I add a terminal-flag to each node; this avoids false partial matches, ie if you add prefix 3056124 it will properly match 3056 instead of returning 305612.

The best way to avoid reloading each time is to turn it into a service; however, before doing so I would measure run-times with APC. It may well be fast enough as is.

Alex: your answer is absolutely correct - but not applicable to this question :)

Hugh Bothwell 2009-09-28 23:35:33

Thank you Hugh, I will benchmark your code with APC, it will probably be fast enough. I am very interested into turning it into a service (even if only to learn from the experience), I had thought about the daemon approach but I am unfamiliar with programing them. As far as I know, the basis of a daemon is an infinite while loop with a sleep call inside. What happens if the daemon receives a request while sleeping?

Alex 2009-09-29 09:18:05

Answer 6

A:

Since you're only working with numbers, maybe working directly with strings is inefficient.

You could perform a binary search algorithm. If you store all your prefixes (numerically), padded to 15 places and then in order, you can scan 6000 codes in approximately log2(6000)~=13 steps.

For example if you have the following codes:

01, 0127, 01273, 0809, 08

You would store the following in an array:

010000000000000
012700000000000
012730000000000
080000000000000
080900000000000

The steps would be:

Strip incoming number down to 15 places.
Perform binary search to find the nearest lowest code (and it's index in the array above)
Look up the length of the code in a separate array (using the index)

Some sample code to see it in action:

// Example for prefixes 0100,01,012,0127,0200
$prefixes = array('0100','0101','0120','0127','0200');
$prefix_lengths = array(4,2,3,4,4);
$longest_length_prefix = 4;

echo GetPrefix('01003508163');

function GetPrefix($number_to_check) {
 global $prefixes;
 global $prefix_lengths;
 global $longest_length_prefix;

 $stripped_number = substr($number_to_check, 0, $longest_length_prefix);

 // Binary search
 $window_floor = 0;
 $window_ceiling = count($prefixes)-1;
 $prefix_index = -1;

 do {
  $mid_point = ($window_floor+$window_ceiling)>>1;

  if ($window_floor==($window_ceiling-1)) {
   if ($stripped_number>=$prefixes[$window_ceiling]) {
    $prefix_index=$window_ceiling;
    break;
   } elseif ($stripped_number>=$prefixes[$window_floor]) {
    $prefix_index=$window_floor;
    break;
   } else {
    break;
   }
  } else {
   if ($stripped_number==$prefixes[$mid_point]) {
    $prefix_index=$mid_point;
    break;
   } elseif ($stripped_number<$prefixes[$mid_point]) {
    $window_ceiling=$mid_point;
   } else {
    $window_floor=$mid_point;
   }
  }
 } while (true);

 if ($prefix_index==-1 || substr($number_to_check, 0, $prefix_lengths[$prefix_index])!=substr($prefixes[$prefix_index],0, $prefix_lengths[$prefix_index])) {
  return 'invalid prefix';
 } else {
  return substr($prefixes[$prefix_index], 0, $prefix_lengths[$prefix_index]);
 }
}

Jonathan Swift 2009-09-29 11:59:11

How would you distinguish between, say, 135 and 13500 as prefixes?

Hugh Bothwell 2009-09-29 16:26:59

No problem. Store 13500... and 13501... but set the prefix length array to 5 and 3 respectively.

Jonathan Swift 2009-09-29 22:45:00

ansaurus

tags:

views:

answers:

Fastest way to match telephony prefixes using asterisk PHP script

related questions