ansaurus

Question

Repeating regex groups

Answer 1

+3 A:

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

"Have you tried using an XML parser instead?"

EDIT: This is the way to go: Beautiful Soup

Isaac Hodes 2010-01-01 20:06:25

http://code.google.com/p/html5lib is worth a try too

THC4k 2010-01-01 20:29:19

Answer 2

+1 A:

You just need to put the block in parens and then use the {...} operators, e.g.:

(foo...){1,10}

Matches 1 to 10 instances of the thing inside of there. Given your example above, you can nest those:

((f..)(b..)){1,10}

scotchi 2010-01-01 20:12:07

Answer 3

+3 A:

This is the wrong way to go unless you're trying to scrape some data out of a tiny fragment.

It would be much better if you used a tolerant HTML. BeautifulSoup mentioned earlier is a good one but it's stagnating and I don't believe it's being maintained actively anymore.

A highly recommended parser for Python is lxml.

There was a long thread discussing parsing XHTML on one of our local mailing lists here which you might find useful too.

Noufal Ibrahim 2010-01-01 20:20:31

ansaurus

tags:

views:

answers:

Repeating regex groups

related questions