ansaurus

Question

how to get html div element innertext by id using regular expression in C#

Answer 1

+1 A:

string divname = "somename";
Match m = RegEx.Match(htmlContent, "<div[^>]*id="+divname+".*?>(.*?)</div");
string contenct = m.Groups[1].Tostring();

won't work if you have nested divs inside the desired div

Am 2009-09-16 07:03:55

Answer 2

+2 A:

Why do people insist on trying to use regex to parse html? You can probably do it if you exclude a whole host of edge-cases... but just use HTML Agility Pack and you're done:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(...); // or Load
string main = doc.DocumentNode.SelectSingleNode("//div[@id='main']").InnerHtml;

(note I'm assuming it is not xhtml; if it is xhtml, use XmlDocument or XDocument, and very similar code to the above)

Marc Gravell 2009-09-16 07:04:18

Thanks. That's very helpful. But HtmlAgilityPack wrong work. When I'm downloading and testing on the previous example, doc.DocumentNode.SelectSingleNode("//div[@id='main']").InnerHtml is return <div id="left" style="float:left">this is a <b>left</b> side:<div style="color:red"> 1 </div> </div> <div id="right" style="float:left"> main side</div><div></div>

ebattulga 2009-09-16 07:45:16

What is it " <div> </div>"

ebattulga 2009-09-16 07:46:52

Explained in comment to the question. In short, HTML Agility Pack is correct; the source html is wrong.

Marc Gravell 2009-09-16 08:52:26

Answer 3

A:

It works for me thanks

nwebsolution 2010-10-20 07:49:45

ansaurus

tags:

views:

answers:

how to get html div element innertext by id using regular expression in C#

related questions