views:

64

answers:

3

I am searching a string for urls...and my preg_match is giving me an incorrect amount of matches for my demo string.

String:

Hey there, come check out my site at www.example.com

Function:

preg_match("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t<]*)#ise", $string, $links);
echo count($links);

The result comes out as 3.

Can anybody help me solve this? I'm new to REGEX.

+5  A: 

$links is the array of sub matches:

If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.

The matches of the two groups plus the match of the full regular expression results in three array items.

Maybe you rather want all matches using preg_match_all.

Gumbo
+1  A: 

If you use preg_match_pattern, (as Gumbo suggested), please note that if you run your regex against this string, it will both match the value of your anchor attribute "href" as well as the linked Text which in this case happens to comtain an url. This makes TWO matches.

It would be wise to run an array_unique on your resultset :)

Martin Hohenberg
great idea...thanks!
johnnietheblack
A: 

In addition to the advice on how to use preg_match, I believe there is something seriously wrong with the regular expression you are using. You may want to trying something like this instead:

 preg_match("_([a-zA-Z]+://)?([0-9a-zA-Z$-\_.+!*'(),]+\.)?([0-9a-zA-Z]+)+\.([a-zA-Z]+)_", $string, $links);

This should handle most cases (although it wouldn't work if there was a query string after the top-level domain). In the future, when writing regular expressions, I recommend the following web-sites to help: http://www.regular-expressions.info/ and especially http://regexpal.com/ for testing them as you're writing them.

Steven Oxley