views:

270

answers:

2

I have been banging my head against the keyboard in search of enlightenment through Google and all Python docs I could get my hands on, but could not find an answer to an issue I'm encountering.

I have the following regex that I run against a website, but Python insists in setting re.DOTALL on it, even though my code does not tell it to:

\d+. +(?P<season>\d+) *\- *(?P<episode>\d+).*?(?P<day>\d+)(?:\/|\s)+(?P<month>[A-Za-z]+)(?:\/|\s)+(?P<year>\d+) +(?:<a .+><img .+></a>)? ?<a .*?>(?P<name>.*?)</a>

This creates an array of seasons/episodes for TV show listings, and it works fine except on epguides.com/BurnNotice (when using the TVRage listings), due to some spacing before newlines (I guess).

Using http://re-try.appspot.com to test, I've narrowed down the issue to the use of re.DOTALL. If I enable it on re-try, it replicates the results I get when I run it standalone on my script. If I untick DOTALL, then it gives me the results I expect.

How can I force Python NOT to use re.DOTALL?

The script runs both on Ubuntu and OS X.

A: 

Then show the code that doesn't set re.DOTALL.

As you are saying the problem is not with the regex but with the calling code.

Leonardo Santagada
+1  A: 

.+> should change to [^>]+> and

.*?> to [^>]*>

You can try replacing others dots into [^\r\n] too, but above 2 changes should be enough.

S.Mark
Making the 2 changes above did not work, but replacing the <img .+> with <img [^\r\n]+> did.Thank you!So using ^ is like a negative in this scenario?
magu
Yes, Its a negative, means anything except `\r\n`
S.Mark