tags:

views:

1151

answers:

3

I'm getting full html code using WebClient. But i need to get specified div from full html using regular expression.

for example:

<body>
<div id="main">
     <div id="left" style="float:left">this is a <b>left</b> side:<div style='color:red'> 1 </div>
     </div>
     <div id="right" style="float:left"> main side</div>
<div>
</body>

if i need div named 'main', function return

<div id="left" style="float:left">this is a <b>left</b> side:<div style='color:red'> 1 </div>
     </div>
     <div id="right" style="float:left"> main side</div>

If i need div named 'left', function return

this is a <b>left</b> side:<div style='color:red'> 1 </div>

If i need div named 'right', function return

 main side

How can i do?

+1  A: 
string divname = "somename";
Match m = RegEx.Match(htmlContent, "<div[^>]*id="+divname+".*?>(.*?)</div");
string contenct = m.Groups[1].Tostring();

won't work if you have nested divs inside the desired div

Am
+2  A: 

Why do people insist on trying to use regex to parse html? You can probably do it if you exclude a whole host of edge-cases... but just use HTML Agility Pack and you're done:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(...); // or Load
string main = doc.DocumentNode.SelectSingleNode("//div[@id='main']").InnerHtml;

(note I'm assuming it is not xhtml; if it is xhtml, use XmlDocument or XDocument, and very similar code to the above)

Marc Gravell
Thanks. That's very helpful. But HtmlAgilityPack wrong work. When I'm downloading and testing on the previous example, doc.DocumentNode.SelectSingleNode("//div[@id='main']").InnerHtml is return <div id="left" style="float:left">this is a <b>left</b> side:<div style="color:red"> 1 </div> </div> <div id="right" style="float:left"> main side</div><div></div>
ebattulga
What is it " <div> </div>"
ebattulga
Explained in comment to the question. In short, HTML Agility Pack is correct; the source html is wrong.
Marc Gravell
A: 

It works for me thanks

nwebsolution