tags:

views:

43

answers:

4

I am using regex in my PHP script to check a page for Rapidshare links, and load them into an array.

My code:

if(preg_match_all('/http:\/\/rapidshare\.com\/files\/.*?\/[^\s]+/', $links[0], $links))
{
    print_r($links);
} else {
    die('Cannot find post links :(');
}

It finds the links correctly, and puts them into an array:

Array
(
    [0] => Array
        (
            [0] => http://rapidshare.com/files/320708377/file_name1.rar
            [1] => http://rapidshare.com/files/320708377/file_name1.rar
            [2] => http://rapidshare.com/files/333708133/file_name2.rar
            [3] => http://rapidshare.com/files/333708133/file_name2.rar
            [4] => http://rapidshare.com/files/330738827/file_name3.rar
            [5] => http://rapidshare.com/files/330738827/file_name3.rar
        )

)

As you can see, it enters two links into the array for each one. I have no clue why it's doing this but I suspect it's something to do with the regex.

Any help? Cheers. :)

A: 

sigh Happens because it's a hyperlink and it's grabbing the URL it loads to and the link text.

Matt
+1  A: 

Just for the record:

$array = array_unique($values); 

It won't work for multi-dimensional arrays though.. so you would have to for each through the first array.

Chacha102
Not necessarily, see http://stackoverflow.com/questions/1247950/how-to-remove-duplicated-2-dimension-array-in-php/1248189#1248189
Alix Axel
Which is why I said it wouldn't work for multi-dimensional arrays...
Chacha102
A: 

preg-match-all Can subject and matches not using same variable name?

It is too confusing.

Also. give us the content of $links

Tommy
+1  A: 

preg_match_all() will not magically duplicate URLs, they must be occurring 2 times each. Are you using the regex on a string of HTML? I suspect that if there's an <a> tag you're capturing both the href and the actual link text:

<a href="http://www.example.com"&gt;http://www.example.com&lt;/a&gt;
John Rasch