views:

729

answers:

1

Here's some c# code

string webPageStr =  @"<html><body><div id=""content"">good content</div><div id=""badcontent"">bad content</div></body></html>";
XmlDocument webPage = new XmlDocument();
webPage.LoadXml(webPageStr);

 XmlElement divElement = webPage.GetElementById("content");

and divElement is equal null and i don't know why

I have also tried declare webPageStr like that

string webPage =  @"<html><body><div id=&quot;content&quot;>good content</div><div id=&quot;badcontent&quot;>bad content</div></body></html>";

but XmlDocument throws en exception System.Xml.XmlException: "&" bad token

Whats wrong with this code?

+3  A: 

You need to include a DOCTYPE declaration if you want to use the GetElementById method. It is because the function doesn't know what ID means for the given XML. In your case you are using XHTML, so you need to specify that when you want to find an element by id this means find a node that has an attribute named "id":

string webPageStr = @"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd""&gt;&lt;html&gt;&lt;body&gt;&lt;div id=""content"">good content</div><div id=""badcontent"">bad content</div></body></html>";
XmlDocument webPage = new XmlDocument();
webPage.LoadXml(webPageStr);
XmlElement divElement = webPage.GetElementById("content");

This first approach means that you need web access to the DOCTYPE declaration when running your code (http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd)

An alternative approach would be to use XPATH expression:

string webPageStr = @"<html><body><div id=""content"">good content</div><div id=""badcontent"">bad content</div></body></html>";
XmlDocument webPage = new XmlDocument();
webPage.LoadXml(webPageStr);
XmlNode divElement = webPage.SelectSingleNode("//div[@id=\"content\"]");
Darin Dimitrov
Thanks it's working :)
ksirg
+1. The attribute name “id” is nothing special to an XML document unless a schema tells it otherwise. (“xml:id” might be, but that's not an [X]HTML attribute...)
bobince
If you use apostrophes instead of quotes to delimit your XPath string literals, you won't have to escape them.
Robert Rossney