tags:

views:

70

answers:

3

Okay, I have the following PHP code to extract an email address of the following two forms:

Random Stranger <[email protected]>
[email protected]

Here is the PHP code:

// The first example
$sender = "Random Stranger <[email protected]>";

$pattern = '/([\w_-]*@[\w-\.]*)|.*<([\w_-]*@[\w-\.]*)>/';

preg_match($pattern,$sender,$matches,PREG_OFFSET_CAPTURE);

echo "<pre>";
print_r($matches);
echo "</pre><hr>";

// The second example
$sender = "[email protected]";

preg_match($pattern,$sender,$matches,PREG_OFFSET_CAPTURE);

echo "<pre>";
print_r($matches);
echo "</pre>";

My question is... what is in $matches? It seems to be a strange collection of arrays. Which index holds the match from the parenthesis? How can I be sure I'm getting the email address and only the email address?

Update:

Here is the output:

Array
(
    [0] => Array
        (
            [0] => Random Stranger 
            [1] => 0
        )

    [1] => Array
        (
            [0] => 
            [1] => -1
        )

    [2] => Array
        (
            [0] => [email protected]
            [1] => 5
        )

)
Array
(
    [0] => Array
        (
            [0] => [email protected]
            [1] => 0
        )

    [1] => Array
        (
            [0] => [email protected]
            [1] => 0
        )

)
A: 

The preg_match() manual page explains how $matches works. It's an optional parameter that gets filled with the results of any bracketed sub-expression from your regexp, in the order that they matched. $matches[0] is always the entire expression match, followed by the sub-expressions.

So for example, that pattern contains two sub-expression, ([\w_-]*@[\w-\.]*) and ([\w_-]*@[\w-\.]*). The parts matching those two expressions will be put into $matches[1] and $matches[2], respectively. I would guess after a quick glance that for the email address of Random Stranger <[email protected]>, you would have something like this in $matches:

Array( 
    0 => "Random Stranger <[email protected]>",
    1 => "Random Stranger",
    2 => "[email protected]"
)

Think of it as passing an array named $matches by reference, that gets filled with all the sub-parts that are matched.

Edit - note that you are using the PREG_OFFSET_CAPTURE flag, which alters the behaviour of how $matches gets filled, so your result won't match my example. The manual explains how this flag alters the capture as well. In this case, instead of a set of matched sub-expressions, you get a multidimensional array of each expression with the position it was found at in the string.

zombat
I'm afraid that's not the output I get. See my update to the question.
George Edison
Ah. That was it! The `PREG_OFFSET_CAPTURE` flag was messing me up. Still, how come the arrays are different sizes? The first one has 3 items, and the second has 2...
George Edison
Yeah, I edited my answer to make room for the fact that you're using the `PREG_OFFSET_CAPTURE` flag, which changes what you get in `$matches`.
zombat
I'm going to guess it has something to do with the `or` in the middle of the expression. If the `[email protected]` style of address matches, the second half of the regexp likely doesn't get evaluated. For an address with a name in front of the address, the second half is the part that matches, but the first half of the regexp gets evaluated as well (it just matches nothing, so you get an empty array element).
zombat
@zombat: What would be the proper way of doing this then?
George Edison
I don't think there's anything wrong with that regexp, it looks like it matches fine. If you needed the two types to have similar `$matches` output though, you could break it into two regexps instead of having the `|` in there.
zombat
A: 

The following is copied directly from the help doc at http://us.php.net/preg_match

If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.

DKinzer
I added the output above. How come the first array has 3 items and the second has 2? They should both have the same number of parenthesis, shouldn't they?
George Edison
+2  A: 

This doesn't help you with your preg question but it will simplify your code. Since those are the only 2 options, dont use regular expressions

echo end( explode( '<', rtrim( $sender, '>' ) ) );
Galen
That works for `[email protected]` but not `Random Stranger <[email protected]>`.
George Edison
sure it does, are you using some ancient version of php or something?
Galen
@Galen: Never mind, it works. Thanks! This is *much* cleaner than RegEx.
George Edison
i haven't read the standard but im assuming that < isn't a valid character for emails. so this should work for everything.
Galen