ansaurus

Question

Answer 1

+4 A:

Don't use regular expressions to parse XML or HTML. You'll never be able to get it to work correctly for nested divs.

Laurence Gonsalves 2009-10-09 00:35:04

Actually, you can, but it's a complete PITA and you're far better off to just use a proper X/HTML parser.

Matthew Scharley 2009-10-09 01:13:01

"Real" regular expressions can't deal with nesting. That's what separates regular languages from context free languages. Most regex implementations are more powerful than strictly regular, but most still aren't powerful enough to deal with nesting.

Laurence Gonsalves 2009-10-09 01:30:48

Answer 2

+11 A:

You might want to consider graduating to an actual HTML parser. I suggest you give Beautiful Soup a try. There are many crazy ways for HTML to be formatted, and the regular expressions may not work correctly all the time, even if you write them correctly.

steveha 2009-10-09 00:36:04

thanks Beautiful Soup works great!

xlione 2009-10-10 00:30:10

Answer 3

+2 A:

try this:

p = re.compile(r'<div\s+class=\"leftTail\">.*?</div>')

Rubens Farias 2009-10-09 00:41:39

ansaurus

tags:

views:

answers:

python regular expression

related questions