tags:

views:

160

answers:

2

Possible Duplicates:
Identifying if a URL is present in a string
Php parse links/emails

I'm working on some PHP code which takes input from various sources and needs to find the URLs and save them somewhere. The kind of input that needs to be handled is as follows:

http://www.youtube.com/watch?v=IY2j_GPIqRA
Try google: http://google.com! (note exclamation mark is not part of the URL)
Is http://somesite.com/ down for anyone else?

Output:

http://www.youtube.com/watch?v=IY2j_GPIqRA
http://google.com
http://somesite.com/

I've already borrowed one regular expression from the internet which works, but unfortunately wipes the query string out - not good!

Any help putting together a regular expression, or perhaps another solution to this problem, would be appreciated.

A: 

Why not try this one. It is the first result of Googling "URL regular expression".

((https?|ftp|gopher|telnet|file|notes|ms-help):((\/\/)|(\\))+[\w\d:#@%\/;$()~_?+-=\.&]*)

Not PHP, but it should work, I just slightly modified it by escaping forward slashes.

source

Josef Sábl
A: 

Jan Goyvaerts, Regex Guru, has addressed this issue in his blog. There are quite a few caveats, for example extracting URLs inside parentheses correctly. What you need exactly depends on the "quality" of your input data.

For the examples you provided, \b(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&@#/%=~_|$?!:,.]*[A-Z0-9+&@#/%=~_|$] works when used in case-insensitive mode.

So to find all matches in a multiline string, use

preg_match_all('/\b(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[-A-Z0-9+&@#\/%=~_|$?!:,.]*[A-Z0-9+&@#\/%=~_|$]/i', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
Tim Pietzcker