How can I use a regular expression to extract groups of html that will be formatted like this:
.
.
.irrelevant html...
<b>Question 6</b><br>
lots of text
<p>
lots of text
<p>
<br>
<b>Answer 6</b><br>
lots of text
<p>
lots of text
<p>
lots of text
<p>
more text
<p>
<HR>
<IMG SRC="/images/image.jpg" alt="alt text" width=480 height=360 hspace=2 vspace=2>
<p>
<i>caption text</i>
There can be a variable amount of Question-Answer pairs. And the image code can be anywhere (either between Question and Answer, or after the answer)...
The only info I want to extract is the Question #, the text sans paragraph html code, the Img src and alt and caption.