ansaurus

Question

PHP: URL detection (regexp) includes line breaks

Answer 1

+3 A:

Well, < is not a whitespace character, so it is matched by [\S]. You can exclude it from your set of accepted characters:

preg_replace('/https?:\/\/[^\s<]+/i', '<a href="\0">\0</a>', $text);

Heinzi 2010-05-16 15:40:48

Thank you, this works fine for the given problem. But it would be perfect if other characters like " and > would also be excluded. Do I have to write them into the first [] ?

2010-05-17 13:39:26

@marco92w: Exactly. Everything inside `[^...]` will be excluded.

Heinzi 2010-05-17 15:28:01

Answer 2

+1 A:

How about using Gruber's URL Regex?

\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

Alix Axel 2010-05-16 16:39:06

Thanks, interesting page. But for my purpose, the regex from above is enough. What are the advantages of your regex?

2010-05-17 13:40:25

@marco92w: The regex is not mine. It's from the same guy who "invented" markdown. Your regex for instance won't autolink `ftp[s]://` or `www.*` (no protocol). Read this and test it: http://daringfireball.net/2009/11/liberal_regex_for_matching_urls

Alix Axel 2010-05-17 23:19:10

Thank you very much, now I understand all the advantages.

2010-05-19 16:40:01

ansaurus

tags:

views:

answers:

PHP: URL detection (regexp) includes line breaks

related questions