ansaurus

Question

Answer 1

+1 A:

Don't use regular expressions to parse HTML. BeautifulSoup will make light work of this.

As for your specific problem, it might be that you are missing a colon at the end of the first line:

for o in re.finditer('left:102[0-9]"><nobr>(.*?)</nobr></div>', words[index]):
    out = o.group(1)

If this isn't the problem, please post the error you are getting, at what you expect the output to be.

Mark Byers 2010-01-28 11:36:08

Yeah, I've heard about it but I wasn't sure it would manage to get all those weird divs, hence the low-level approach

Hal 2010-01-28 11:38:28

@Hal: BeautifulSoup can find tags based on attributes, and it can even accept regex as arguments for the search if you need that.

Mark Byers 2010-01-28 11:41:01

Cool, didn't know it was so powerful. Anyway, I've practically finished the script, all that's missing is getting those integers. I guess I could simply make 10 searches, but that would be plain dumb and I'd like to learn how one could use regex on that string.

Hal 2010-01-28 11:43:29

You did it. I wasn't getting any error at all, for some reason the damn thing would just output a blank space.Thanks for putting up with this noob crap, it's guys like you that make StackOverflow so awesome.

Hal 2010-01-28 11:48:57

ansaurus

tags:

views:

answers:

Parsing a range of integers in a list

related questions