tags:

views:

257

answers:

3

I have a string input that i do not know whether or not is valid xml.

I think the simplest aprroach is to wrap

new XmlDocument().LoadXml(strINPUT);

In a try/catch.

The problem im facing is, sometimes strINPUT is an html file, if the header of this file contains

<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd""&gt;
<html xml:lang=""en-GB"" xmlns=""http://www.w3.org/1999/xhtml"" lang=""en-GB"">

...like many do, it actually tries to make a connection to the w3.org url, which i really dont want it doing.

Anyone know if its possible to just parse the string without trying to be clever and checking external urls? Failing that is there an alternative to xmldocument?

+2  A: 

I am not sure about the reason behind the problem but Have you tried XDocument and XElement classes in System.Xml.Linq

 XDocument document = XDocument.Load(strINPUT , LoadOptions.None);
 XElement element = XElement.Load(strINPUT );

EDIT: for xml as string try following

XDocument document = XDocument.Parse(strINPUT , LoadOptions.None );
Asad Butt
Yep :( Using etherreal i can see it's trying to connect.
maxp
Also strINPUT isnt a url its the actual markup.
maxp
The reason behind the connects is that the LoadXML or Load functions try to retrieve the DTD from the specified location. This is part of the normal parsing procedure.
Obalix
try the edit, hope this works now
Asad Butt
sure but he is open for an `alternative`!!. see the last line in the question!
Asad Butt
+4  A: 

Try the following:

XmlDocument doc = new XmlDocument();
using (var reader = XmlReader.Create(new StringReader(xml), new XmlReaderSettings() {
    ProhibitDtd = true,
    ValidationType = ValidationType.None
})) {
    doc.Load(reader);
}

The code creates a reader that turns off DTD processing and validation. Checking for wellformedness will still apply.

Alternatively you can use XDocument.Parse if you can switch to using XDocument instead of XmlDocument.

Obalix
+1  A: 

Use XmlDocument's load method to load the xml document, use XmlNodeList to get at the elements, then retrieve the data ... try the following:

XmlDocument xmlDoc = new XmlDocument();
//use the load method to load the XML document from the specified stream.
xmlDoc.Load("myXMLDoc.xml");
//Use the method GetElementsByTagName() to get elements that match the specified name.
XmlNodeList item = xDoc.GetElementsByTagName("item");
XmlNodeList url = xDoc.GetElementsByTagName("url"); 
Console.WriteLine("The item is: " + item[0].InnerText));

add a try/catch block around the above code and see what you catch, modify your code to address that situation.

Iggy
Does not solve the problem in the OP! Your code will also fail on `xmlDoc.Load("xmlfile");`.
Obalix
Thanks for your contribution, but this is not the problem. op is struggling for not to let the object connect to the uri inside xml.
Asad Butt