views:

38

answers:

2

I am using preg_match_all to search for HashTag values in a Twitter Search response.

It works as I expected except for when the search results don't have any hash values in them. For some reason my $tags array still has values and I'm not sure why.

Is it because my RegEx is not correct, or is it a problem with preg_match_all?

Thanks

$tweet = "Microsoft Pivot got Runner-Up for Network Tech from The Wall Street Journal in 2010 Technology Innovation Awards  http://bit.ly/9pCbTh";

private function getHashTags($tweet){
    $tags = array();
    preg_match_all("/(#\w+)/", $tweet, $tags);

    return $tags;

}

results in:

Array ( [0] => Array ( ) [1] => Array ( ) )

Expected results:

Array();
A: 

You get two empty arrays because you are matching an expression and a subexpression. Your expected results are actually the error here. Check the manual, specifically the description of the default behavior when no flags are passed in the fourth argument:

Orders results so that $matches[0] is an array of full pattern matches, $matches1 is an array of strings matched by the first parenthesized subpattern, and so on.

You always get a multi-dimensional array from preg_match_all unless you pass PREG_OFFSET_CAPTURE as the flag argument. In this case, you should actually get an empty array for an expression that doesn't match anything.

Matt Kane
this is not true, You need to pass PREG_SET_ORDER
Galen
+2  A: 

In default mode, preg_match_all returns an array of matches and submatches:

PREG_PATTERN_ORDER
Orders results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings matched by the first parenthesized subpattern, and so on.

So in this case the first array is the array of matches of the whole pattern and the second array is the array of matches of the first subpattern. And since there was no match found, both arrays are empty.

If you want the other order, having each match in an array with its submatches, use PREG_SET_ORDER in the flags parameter:

preg_match_all("/(#\w+)/", $tweet, $tags, PREG_SET_ORDER);
Gumbo
Thanks for the explanation of the result array. That makes sense and I was able to work my solution now that I know what to look for.
discorax