views:

800

answers:

7

I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing dozens of these every minute.

Real world example is, Im building a link status checker, which will check links to various online video sites, to ensure that the videos are still live. Each domain has several "dead keywords", if these are found in the html of a page, that means the file was deleted. These are stored in the array. I need to match the contents pf the array, against the html output of the page.

A: 

If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.

Seb
A: 

If you're merely searching for the presence of a string in another string, use strpos as it is faster.

Otherwise, you could just iterate over the array of patterns, calling preg_match each time.

David Caunt
A: 

// assuming you have something like this

$patterns = array('a','b','\w');

// then I would do the following

$patterns_flattened = implode($patterns,'|');

if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) ) { }

TravisO
A: 

// assuming you have something like this

$patterns = array('a','b','\w');

// then I would do the following

$patterns_flattened = implode($patterns,'|');

if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) ) { }

// PS: that's off the top of my head, I didn't check it in a code editor

TravisO
+2  A: 

First of all, if you literally are only doing dozens every minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:

$matches = false;
foreach ($pattern in $pattern_array)
{
  if (preg_match($pattern, $page))
  {
    $matches = true;
  } 
}

You can indeed combine all the patterns into one using the or operator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns contain the or operator.

I would recommend at least grouping your patterns using parenthesis like:

foreach ($pattern in $patterns)
{
  $grouped_patterns[] = "(" . $pattern . ")";
}
$master_pattern = implode($grouped_patterns, "|");

But... I'm not really sure if this ends up being faster. Something has to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.

Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.

So doing this:

foreach ($string_to_match in $strings_to_match)
{
  if (strpos($page, $string_to_match) !== false))
  {
    // etc.
  }
}
foreach ($pattern in $pattern_array)
{
  if (preg_match($pattern, $page))
  {
    // etc.
  } 
}

and avoiding as many preg_match() as possible is probably going to be your best gain. strpos() is a lot faster than preg_match().

danieltalsky
A: 

What about doing a str_replace() on the HTML you get using your array and then checking if the original HTML is equal to the original? This would be very fast:

 $sites = array(
      'you_tube' => array('dead', 'moved'),
      ...
 );
 foreach ($sites as $site => $deadArray) {
     // get $html
     if ($html == str_replace($deadArray, '', $html)) { 
         // video is live
     }
 }
Darryl Hein
A: 

Nice idea guys but how do you match html tags and get the content?

Techie Talks