views:

276

answers:

4

I am not very good at regex, but I need to convert the following example from this

<li>Creations by Carol - www.driedfloralcreations.com</li>

to

<li>Creations by Carol - <a href="http://www.driedfloralcreations.com" rel="external">www.driedfloralcreations.com</a></li>
+2  A: 

How about this in PHP?

$string = '<li>Creations by Carol - www.driedfloralcreations.com</li>';
$pattern = '/(www\.[a-z\d-\.]+\.[a-z]+)/i';
$replacement = '<a href="http://$1" rel="external">$1</a>';
echo preg_replace($pattern, $replacement, $string);

Assumes your links are always www.something.extension.

David Caunt
You forgot uppercase and symbols, and you did not escaped the last dot, and the last part of the url is not _really_ a [a-z]+, but rather a list of choices.
Aurélien Vallée
The i after the closing slash denotes an insensitive match. I've added the missing backslash. I made the assumption that Brad doesn't want to enumerate hundreds of TLDs and his users will enter valid domains. He didn't ask for an exhaustive or highly complex solution so I wrote a simple regex.
David Caunt
It's for a regex replacement in a text editor - quick and dirty is desireable. However, some editors have non-convential implementations that may differ from PHPs. Can someone confirm this will work in TM?
Stuart Branham
This will work in TextMate with only one modification: the `-` in the character class needs to be escaped. Also, the case sensitivity flag is a checkbox. So, regex as follows: `(www\.[a-z\d\-\.]+\.[a-z]+)`
Emily
A: 
www\.[a-zA-Z0-9_-]+\.(fr|com|org|be|biz|info|getthelistsomewhere)
Aurélien Vallée
+1  A: 

You have to be really clear about how much information you need to give the regex to avoid false positives.

For example is the pattern www.something.somethingelse enough? are there other www in the file that would get caught?

maybe <li> something - somethingelse</li> is the correct match. We cannot guess without knowing your whole file. There might be other <li> in there that you don't want to change.

gnibbler
+2  A: 

If you're only looking for URLs in <li> elements formatted like the one in your question, it should be much simpler than a lot of the other suggested solutions. You don't really need to validate your URLs, I assume, you just want to take a list of site names and URLs and turn the URLs into links.

Your search pattern could be:

<li>(.+) - (https?:\/\/)?(\S+?)<\/li>

And the replace pattern would be:

<li>$1 - <a href="(?2:$2:http\://)$3" rel="external">$3</a></li>

Just tested the find/replace out in TextMate and it worked nicely. It addes http:// if it isn't already present, and otherwise assumes that whatever is after the - is a URL as long as it doesn't contain a space.

For testing out regular expressions, Rubular is a great tool. You can paste in some text, and it'll show you what matches as you type your regex. It's a ruby tool, but TextMate uses the same regex syntax as ruby.

Emily
This looks good, but i think the S+ match should be non-greedy just in case there is another <li>withoutspaces<\li> following.
gnibbler
Good suggestion, I didn't think of that. I've changed it.(Sorry to have misattributed the suggestion in the edit comments, though. That's what I get for copy/pasting too fast)
Emily