tags:

views:

92

answers:

2

A system I am writing uses Markdown to modify links, but I also want to make plain links active, so that typing http://www.google.com would become an active link. To do this, I am using a regex replacement to find urls, and rewrite them in Markdown syntax. The problem is that I can not get the regex to not also parse links already in Markdown syntax.

I'm using the following code:

$value = preg_replace('@((?!\()https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '[$1]($1)', $value);

This works well for plain links, such as http://www.google.com, but I need it to ignore links already in the Markdown format. I thought the section (?!() would prevent it from matching urls which followed a parenthesis, but it would seem that I am in error.

I realize that even this is not an ideal solution (if it worked), but this is pushing beyond my regex abilities.

+1  A: 

I think (?<!\() is what you meant. If the match position is at the beginning of http://www.google.com, it's not the next character you need to check, but the previous one. In other words you need a negative lookbehind, not a negative lookahead.

Alan Moore
That did the trick! Thank you, sir!
BigDave
A: 

regexes are notoriously bad at stuff like this, you might end up with all sorts of clever html exploits you never could have thought of. IMO you should mod the markdown script to flag markdown URLs as it sees them, so you can ignore flagged URLs when you find them all with a very very simple search that doesn't leave complexity to hack.

Dustin Getz