views:

186

answers:

5

I'm wondering if it's possible (recommended might be the better word) to use sed to convert URLs into HTML hyperlinks in a document. Therefore, it would look for things like:

http://something.com

And replace them with

<a href="http://something.com"&gt;http://something.com&lt;/a&gt;

Any thoughts? Could the same also be done for email addresses?

A: 

you can use awk

awk '
{
 for(i=1;i<=NF;i++){
   if ($i ~ /http/){
      $i="<a href=\042"$i"\042>"$i"</a>"
   }
 }
} 1 ' file

output

$ cat file
blah http://something.com test http://something.org

$ ./shell.sh
blah <a href="http://something.com"&gt;http://something.com&lt;/a&gt; test <a href="http://something.org"&gt;http://something.org&lt;/a&gt;
ghostdog74
+1  A: 

This might work.

sed -i -e "s|http[:]//[^ ]*|<a href=\"\0\">\0</a>|g" yourfile.txt

It depends on the url being followed by a space (which isn't always the case).

You could do similar for e-mails with.

sed -i -e "s|\w+@\w+\.\w+(\.\w+)?|<a href=\"mailto:\0\">\0</a>|g" yourfile.txt

Those might get you started. I suggest leaving off the -i option to test your output before making the changes inline.

Jason R. Coombs
A: 
sed -i.bakup 's|http.[^ \t]*|<a href="&">&</a>|'  htmlfile
A: 

While you could use sed, and I will typically only use sed if I need something that's write-only (that is, it only needs to work and doesn't need to be maintained).

I find the Python regular expression library to be more accessible (and gives the ability to add more powerful constructs).

import re
import sys

def href_repl(matcher):
    "replace the matched URL with a hyperlink"
    # here you could analyze the URL further and make exceptions, etc
    #  to how you did the substitution. For now, do a simple
    #  substitution.
    href = matcher.group(0)
    return '<a href="{href}">{href}</a>'.format(**vars())

text = open(sys.argv[1]).read()
url_pattern = re.compile(re.escape('http://') + '[^ ]*')
sys.stdout.write(url_pattern.sub(href_repl, text))

Personally, I find that much easier to read and maintain.

Jason R. Coombs
A: 

The file contain the following content

http://something.com

The following code will give the correct output

sed -r 's/(.*)/\<a href="\1">\1\<\/a\>/' file
muruga
This answer is trivial, provides no additional information over other answers previously given, and doesn't even output correct HTML for the example supplied (missing quotes).
Jason R. Coombs
Now it is give the correct answer. It will give the quotes also.
muruga
not really. rememeber OP has a document that has other text. if you use (.*), you will be substituting the whole line with other text as well.
ghostdog74