tags:

views:

79

answers:

2

I am trying to match emails from one of my own sites using a regular expression. Using preg_match_all($pattern,$site,$array) the results I get are duplicate. So for example, using:

$pattern = '/[\w-]+@([\w-]+\.)+[\w-]+/i';

I get:

Array
(
    [0] => uk@example1.com
    [1] => uk@example2.com
    [2] => sales@woot.com
    [3] => sales@woot.com
    [4] => info@regex.com
    [5] => info@regex.com
    [6] => direct@yadayada.com.au
    [7] => direct@yadayada.au
    [8] => adrian@blahblah.com
    [9] => adrian@blahblah.com
)

So, why am I getting duplicates? Is this a problem with my regex?

The string I am searching is a URL using the file_get_contents() method. I've checked the string to make sure it wasn't pulling the page twice.

+6  A: 

if you are matching HTML you are probably matching both the href in the a tag and the content of the a tag.

<a href="mailto:uk@example1.com">uk@example1.com</a>
Josh
OMG haha you're probably right!
Graham
try using /mailto:([\w-]+@([\w-]+\.)+[\w-]+)/gi to get just the mailto value
Josh
+2  A: 

If you're dealing with a small enough dataset, you could just throw the array into array_unique() which will you give you back an array with the duplicates removed.

Mark Biek
Of course it's probably better to understand what's going wrong but I figured I'd throw it out there :)
Mark Biek