tags:

views:

449

answers:

3

hello all, I have this string

<p/><ul><li>test1<p/></li><li>test2<p/></li></ul><p/>

What i attempt to do is extract all the "p" tag within the "li" tag, but not the "p" tag outside of it.

I'm only able so far to extract all the "li" tags by

\<li\>(.*?)\</li\>

I'm lost at how to extract the "p" tag within it.

Any pointer is greatly appreciated it!!

+2  A: 
<li>(.*?<p/?>.*?)</li>

Will match all content between <li> which also contain a <p/>. If you just want to match the <p/> then:

(?<=<li>).*?(<p/?>).*?(?=</li>)

Will have group 1 match the <p/> tag.

Pindatjuh
You and James are Regx Nija! Thanks a lot
Liming
+4  A: 

It is a lot more reliable to use an HTML parser instead of a regex. Use HTML Agility Pack:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<p/><ul><li>test1<p/></li><li>test2<p/></li></ul><p/>");
IEnumerable<HtmlNode> result = doc.DocumentNode
                                  .Descendants("li")
                                  .SelectMany(x => x.Descendants("p"));
Mark Byers
Thanks Marks. Actually, I'm parsing bbcode out a bunch of text and after the last iteration of converting bbcode, the text came out like that, so I need to do a bit clean up. But thanks for the suggestion though.
Liming
+1  A: 

Try this, it uses lookahead so that the LI is not part of the selection.

(?<=<li>)(.*?<p/?>.*?)(?=</li>)

P.S. You also need to fix your HTML because the way you have P tags is not right. The Regex works on this HTML below.

<ul><li><p>test1<p/></li><li><p>test2<p/></li></ul>
James
Thanks James! You and pinda are GOOD! thanks a lot!
Liming