ansaurus

Question

Simple regular expression parsing in bash

Answer 1

+2 A:

Here's a small awk script that should do what you need.

awk '/URL:/ { sub(/^URL:/,"", $3); printf "<a href=\"%s"\">%s</a>\n", $3, $3; }'

ar 2010-10-19 12:18:54

Answer 2

+1 A:

This should work:

sed -n 's%^.* URL:\(.*\) \[[0-9]*\] -> .*$%<a href="\1">\1</a>%p' log.txt

mouviciel 2010-10-19 12:23:42

Do you *really* need the backslash before round brackets?

Salman A 2010-10-19 13:05:51

With `sed`, yes I do.

mouviciel 2010-10-19 13:27:41

Salman A 2010-10-19 15:45:36

Answer 3

+1 A:

What about sed:

sed -n 's/.*URL:\([^ ]\+\) .*/<a href="\1">\1<\/a>/;/<a href/p' logfile

(Please note: you can address the URL part more properly, e.g. by the length of the date string in front of it, but I was just lazy.)

Zsolt Botykai 2010-10-19 12:27:07

Answer 4

+2 A:

Here's a bash solution

#!/bin/bash
exec 4<"log.txt"
while read -r line<&4
do
  case "$line" in
    *URL:* )
      url="${line#*URL:}"
      url=${url%% [*}
      echo "<a href=\"${url}\">${url}</a>"
  esac
done
exec 4<&-

ghostdog74 2010-10-19 12:27:16

Answer 5

+1 A:

Something like this:

while read line
do
        URL=$(echo $line | egrep -o 'URL:[^ ]+' | sed  's/^URL://')     
        if [ -n "$URL" ]; then
                echo "<a href=\"$URL\">$URL</a>" >> output.txt
        fi  
done < input.txt

codaddict 2010-10-19 12:33:29

using `egrep` to read the file is faster than the outer while loop. `egrep -o 'URL:[^ ]+' input.txt| sed .....|while read ....`. btw, `egrep` is `grep -E` now.

ghostdog74 2010-10-19 12:40:32

@ghostdog74: Thanks for the `egrep` tip. But didn't get the first part.

codaddict 2010-10-19 12:45:34

You have an outer while read loop that iterates the file, and for each line, you are calling 2 external commands, `egrep` and `sed` using pipes. This is expensive operation. Hence the suggestion to use `egrep` to iterate the file instead, since its optimized to go over files, large or small, more efficiently. And no, your script is not wrong, just not optimized in terms of speed, that's all. :)

ghostdog74 2010-10-19 12:51:05

@ghostdog74: Got it, thanks :)

codaddict 2010-10-19 13:38:10

I used part of this script to post-process the file.

Salman A 2010-10-19 15:42:09

ansaurus

tags:

views:

answers:

Simple regular expression parsing in bash

related questions