tags:

views:

24

answers:

4

I have an input file in following manner

<td> Name1 </td>
<td> <span class="test"><a href="url1">Link </a></span></td>
<td> Name2 </td>
<td> <span class="test"><a href="url2">Link </a></span></td>

I want a awk script to read this file and output in following manner

url1 Name1
url2 Name2

Can anyone help me out in this trivial looking problem? Thanks.

+2  A: 
msw
Your easy perl script doesn't cut it ;-). With the given input it outputs the lines "url1" and "url2".
Peter G.
@peter: ah, the joys of plesiochronous editing. ;)
msw
+1 for pointing out the need for a real parser (and using the term *plesiochronous*).
Amardeep
+1  A: 

Here is an awk script that does the job

awk '
/a href=\".*\"/ { sub( /^.*a href=\"/,"" ); sub(/\".*/,"");  print $0, name }
                { name = $2 }
'
Peter G.
A: 

this might work:

awk 'BEGIN
     {i=1}{line[i++]=$0}
     END
     {
      j=1; 
      while (j<i) 
      {print line[j+1] line[j]; j+=2}
     }' yourfile|awk '{print substr($4,7,length($4)-6),$6}'
Vijay Sarathi
A: 
gawk '/^<td>/ {n = $2; getline; print gensub(/.*href="([^"]*).*/,"\\1",1), n}' infile

url1 Name1
url2 Name2
DanielAjoy