tags:

views:

79

answers:

2

I am trying to match emails from one of my own sites using a regular expression. Using preg_match_all($pattern,$site,$array) the results I get are duplicate. So for example, using:

$pattern = '/[\w-]+@([\w-]+\.)+[\w-]+/i';

I get:

Array
(
    [0] => [email protected]
    [1] => [email protected]
    [2] => [email protected]
    [3] => [email protected]
    [4] => [email protected]
    [5] => [email protected]
    [6] => [email protected]
    [7] => [email protected]
    [8] => [email protected]
    [9] => [email protected]
)

So, why am I getting duplicates? Is this a problem with my regex?

The string I am searching is a URL using the file_get_contents() method. I've checked the string to make sure it wasn't pulling the page twice.

+6  A: 

if you are matching HTML you are probably matching both the href in the a tag and the content of the a tag.

<a href="mailto:[email protected]">[email protected]</a>
Josh
OMG haha you're probably right!
Graham
try using /mailto:([\w-]+@([\w-]+\.)+[\w-]+)/gi to get just the mailto value
Josh
+2  A: 

If you're dealing with a small enough dataset, you could just throw the array into array_unique() which will you give you back an array with the duplicates removed.

Mark Biek
Of course it's probably better to understand what's going wrong but I figured I'd throw it out there :)
Mark Biek