ansaurus

Question

Answer 1

A:

No. Just no.

No.

Regular expressions cannot be used to parse HTML, because the regular expressions language is of insufficient complexity compared to HTML. Use an HTML parser instead, a simple event-driven (SAX-type) thing should be sufficient.

Williham Totland 2010-08-27 23:40:55

So I recognize this often frowned upon, but I know this to be a well formatted HTML document. Working with an HTML parser seems ridiculously complex for such a simple task.

Nic 2010-08-27 23:49:08

Answer 2

A:

$ awk -vFS="<.[^>]*>" '{for(i=2;i<=NF;i+=2)print $i}' file
I
very

ghostdog74 2010-08-28 00:56:55

Answer 3

+1 A:

Give this a try:

sed -n 's|[^<]*<i>\([^<]*\)</i>[^<]*|\1\n|gp'

And your example is missing a "/":

Hello, <i>I</i> am <i>very</i> glad to meet you.

Dennis Williamson 2010-08-28 01:56:13

ansaurus

tags:

views:

answers:

Extract HTML tag data with sed

related questions