views:

235

answers:

4

I have an array of strings that I want to try and match to the end of a normal string. I'm not sure the best way to do this in PHP.

This is sorta what I am trying to do:

Example:

Input: abcde

Search array: er, wr, de

Match: de

My first thought was to write a loop that goes through the array and crafts a regular expression by adding "\b" on the end of each string and then check if it is found in the input string. While this would work it seems sorta inefficient to loop through the entire array. I've been told regular expressions are slow in PHP and don't want to implement something that will take me down the wrong path.

Is there a better way to see if one of the strings in my array occurs at the end of the input string?

The preg_filter() function looks like it might do the job but is for PHP 5.3+ and I am still sticking with 5.2.11 stable.

+5  A: 

For something this simple, you don't need a regex. You can either loop over the array, and use strpos to see if the index is length(input) - length(test). If each entry in the search array is always of a constant length, you can also speed things up by chopping the end off the input, then comparing that to each item in the array.

You can't avoid going through the whole array, as in the worst general case, the item that matches will be at the end of the array. However, unless the array is huge, I wouldn't worry too much about performance - it will be much faster than you think.

Adam Wright
Thanks! I knew there must have been an easier way.
+1  A: 

Though compiling the regular expression takes some time I wouldn't dismiss using pcre so easily. Unless you find a compare function that takes several needles you need a loop for the needles and executing the loop + calling the compare function for each single needle takes time, too.

Let's take a test script that fetches all the function names from php.net and looks for certain endings. This was only an adhoc script but I suppose no matter which strcmp-ish function + loop you use it will be slower than the simple pcre pattern (in this case).

count($hs)=5549
pcre: 4.377925157547 s
substr_compare: 7.951938867569 s
identical results: bool(true)

This was the result when search for nine different patterns. If there were only two ('yadda', 'ge') both methods took the same time.

Feel free to criticize the test script (aren't there always errors in synthetic tests that are obvious for everyone but oneself? ;-) )

<?php
/* get the test data
All the function names from php.net
*/
$doc = new DOMDocument;
$doc->loadhtmlfile('http://docs.php.net/quickref.php');
$xpath = new DOMXPath($doc);
$hs = array();
foreach( $xpath->query('//a') as $a ) {
  $hs[] = $a->textContent;
}
echo 'count($hs)=', count($hs), "\n";
// should find:
// ge, e.g. imagick_adaptiveblurimage
// ing, e.g. m_setblocking
// name, e.g. basename 
// ions, e.g. assert_options
$ns = array('yadda', 'ge', 'foo', 'ing', 'bar', 'name', 'abcd', 'ions', 'baz');
sleep(1);

/* test 1: pcre */
$start = microtime(true);
for($run=0; $run<100; $run++) {
  $matchesA = array();
  $pattern = '/(?:' . join('|', $ns) . ')$/';
  foreach($hs as $haystack) {
    if ( preg_match($pattern, $haystack, $m) ) {
      @$matchesA[$m[0]]+= 1;
    }
  }
}
echo "pcre: ", microtime(true)-$start, " s\n";
flush();
sleep(1);

/* test 2: loop + substr_compare */
$start = microtime(true);
for($run=0; $run<100; $run++) {
  $matchesB = array();
  foreach( $hs as $haystack ) {
    $hlen = strlen($haystack);
    foreach( $ns as $needle ) {
      $nlen = strlen($needle);
      if ( $hlen >= $nlen && 0===substr_compare($haystack, $needle, -$nlen) ) {
        @$matchesB[$needle]+= 1;
      }
    }
  }
}
echo "substr_compare: ", microtime(true)-$start, " s\n";
echo 'identical results: '; var_dump($matchesA===$matchesB);
VolkerK
A: 

I might approach this backwards;

if your string-ending list is fixed or varies rarely, I would start by preprocessing it to make it easy to match against, then grab the end of your string and see if it matches!

Sample code:

<?php

// Test whether string ends in predetermined list of suffixes
// Input: string to test
// Output: if matching suffix found, returns suffix as string, else boolean false
function findMatch($str) {
    $matchTo = array(
        2 => array( 'ge' => true, 'de' => true ),
        3 => array( 'foo' => true, 'bar' => true, 'baz' => true ),
        4 => array( 'abcd' => true, 'efgh' => true )
    );

    foreach($matchTo as $length => $list) {
        $end = substr($str, -$length);

        if (isset($list[$end]))
            return $end;
    }

    return $false;
}

?>
Hugh Bothwell
A: 

This might be an overkill but you can try the following. Create a hash for each entry of your search array and store them as keys in the array (that will be your lookup array).

Then go from the end of your input string one character at time (e, de,cde and etc) and compute a hash on a substring at each iteration. If a hash is in your lookup array, you have much.

discovlad