Hi all, I'm looking for a bit of help with a regex in python and google is failing me. Basically I'm searching some html and there is a certain type of table I'm searching for, specifically any table that includes a background tag in it (i.e. BGCOLOR). Some tables have this tag and some do not. Could someone help me out with how to write a regex that searches for the start of the table, then searches for the BGCOLOR but if it hits the end of the table then it stops and moves on?
Here's a very simplified example that will server the purpose:
`<TABLE>
<B>Item 1.</B>
</TABLE>
<TABLE>
BGCOLOR
</TABLE>
<TABLE>
<B>Item 2.</B>
</TABLE>`
So we have three tables but I'm only interested in finding the middle table that contains 'BGCOLOR' The problem with my regex at the moment is that it searches for the starting table tag then looks for 'BGCOLOR' and doesn't care if it passes the table end tag:
tables = re.findall('\<table.*?BGCOLOR=".*?".*?\<\/table\>', text, re.I|re.S)
So it would find the first two tables instead of just the second table. Let me know if anyone knows how to handle this situation.
Thanks, Michael