Using the following text as a sample, I need to be able to extract text between LI tags. Notice that the first LI is intentionally mal-formed as this may be the case. Said another way, I want everything from an LI tag to either it's closing LI tag or the next LI opening tag.
<UL>
<LI class="test">This is the first ListItem Text.
<LI>This is the second ListItem Test. </LI></UL>
So far I have come up with:
<[Ll][Ii].*>(.*?)((?:<[Ll][Ii]>)|(?:</[Ll][Ii]>))
But this appears to be matching the first LI tag until the closing tag as one match with the group being the text of the 2nd LI tag. I've managed to get it to return the first set but never both. I'm using the "Dot matches newline" option as well and this is .NET for which I need it to work. Thanks!
UPDATE
I had done some research prior to posting this question and did in fact see and understand that using regex's to parse html is a bad idea. That being said, I only need to be able to get text from a couple LI tags here and there to determine what text to bulletize on a powerpoint slide. I thought there might be a simpler way to do it rather than dealing with a separate library, especially when use of third party libraries is tricky to deal with where I work. Unfortunately it appears that the HTML can end up mal-formed in certain situations when using an html rich text entry box on a page that allows you to bulletize text. Thanks for all of the recommendations against REGEX use for parsing HTML. I should have specified up front that I have read a lot of similar advice already but was looking for a quick work around for a simple set of circumstances.