ansaurus

Question

Answer 1

+5 A:

I don't think it is failing because of the <br/> but rather because the paragraph is spread across multiple lines. Use the DOTALL mode to fix this:

pattern = re.compile('<p class=\"thisClass\">(.*?)<\/p>', re.DOTALL)

Nadia Alramli 2009-05-28 22:22:03

Or rather you should use the re.DOTALL mode to make the dot also match newlines. http://www.regular-expressions.info/python.html

Rene Saarsoo 2009-05-28 22:27:38

@Rene, thanks you are right fixed my answer

Nadia Alramli 2009-05-28 22:30:45

Thanks for your answer! I know it's a newbie mistake ;)

sotangochips 2009-05-28 22:40:18

Answer 2

+2 A:

It turns out the answer was to include re.S as a flag which allows the "." character to match newlines as well.

pattern = re.compile('<p class=\"thisClass\">(.*?)<\/p>', re.S)

This works perfectly.

sotangochips 2009-05-28 22:31:34

That's a shortcut to the DOTALL mode

Nadia Alramli 2009-05-28 22:32:55

Matching text within P tags in HTML