tags:

views:

31

answers:

1

Hi,

date = re.search(r'<td>([\x\d\w-.\s,()&\"]+|)<br><font',page_data)

I am migrating a code from PHP to Python, and am using this piece of regex expression on re.match, which doesn't work, giving a python error of:

raise error, v # invalid expression

It works on PHP's preg_match, and also http://www.gskinner.com/RegExr , any idea why this is happening? Thanks!

+3  A: 
\x

on its own is invalid (both in PHP and Python, but perhaps PHP just ignores it while Python throws an exception). Try removing it, and also moving the - to the end of the character class:

date = re.search(r'<td>([\d\w.\s,()&\"-]+|)<br><font',page_data)

But in all cases, you won't get very happy if you try parsing HTML with regular expressions.

Tim Pietzcker
RE: Parsing X?HTML with regexes: [DON'T DO IT](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454).
Hank Gay