ansaurus

Question

Best way to read, modify, and write XML

Answer 1

A:

this is a simple article on how to read and write xml files

http://www.c-sharpcorner.com/uploadfile/mahesh/readwritexmltutmellli2111282005041517am/readwritexmltutmellli21.aspx

this is a very simple introduction only to get you started on the concepts and namespaces

PaulStack 2010-09-17 15:07:13

Thanks, but I've already read that, and based some of my code from it.

wonea 2010-09-17 15:11:31

Answer 2

A:

Just start by reading the documentation of the Xml namespace on the MSDN. Then if you have more specific questions, post them here...

md5sum 2010-09-17 15:08:09

Answer 3

+1 A:

If you have smaller documents which fit in computers memory you can use XmlDocument. Otherwise you can use XmlReader to iterate through the document.

Using XmlReader you can find out the elements type using:

while (xml.Read()) {
   switch xml.NodeType {
     case XmlNodeType.Element:
      //Do something
     case XmlNodeType.Text:
      //Do something
     case XmlNodeType.EndElement:  
      //Do something
   }
}

codymanix 2010-09-17 15:09:47

Answer 4

A:

Are the documents you are processing relatively small? If so, you could load them into memory using an XmlDocument object, modify it, and write the changes back out.

XmlDocument doc = new XmlDocument();
doc.Load("path_to_input_file");
// Make changes to the document.
XmlTextWriter xtw = new XmlTextWriter("path_to_output_file", Encoding.UTF8);
doc.WriteContentTo(xtw);

Depending on the structure of the input XML, this could make your parsing code a bit simpler.

Pat Daburu 2010-09-17 15:11:55

Answer 5

A:

One fairly easy approach would be to create a new XmlDocument, then use the Load() method to populate it. Once you've got the document, you can use CreateNavigator() to get an XPathNavigator object that you can use to find and alter elements in the document. Finally, you can use the Save() method on the XmlDocument to write the changed document back out.

ngroot 2010-09-17 15:14:16

Answer 6

+4 A:

If it's actually valid XML, and will easily fit in memory, I'd choose LINQ to XML (XDocument, XElement etc) every time. It's by far the nicest XML API I've used. It's easy to form queries, and easy to construct new elements too.

You can use XPath where that's appropriate, or the built-in axis methods (Elements(), Descendants(), Attributes() etc). If you could let us know what specific bits you're having a hard time with, I'd be happy to help work out how to express them in LINQ to XML.

If, on the other hand, this is HTML which isn't valid XML, you'll have a much harder time - because XML APIs generalyl expect to work with valid XML documents. You could use HTMLTidy first of course, but that may have undesirable effects.

For your specific example:

foreach (var img in doc.Descendants("img"))
{
    // src will be null if the attribute is missing
    string src = (string) img.Attribute("src");
    img.SetAttributeValue("src", src + "with-changes");
}

Jon Skeet 2010-09-17 15:20:53

Bump XDocument for great justice.

annakata 2010-09-17 15:23:00

I heartily agree! I had a couple of older apps I had to do the hard way with parsing and the like and L2X makes it so much easier and powerful.

Dillie-O 2010-09-17 15:23:05

Jon, you may find HtmlAgilityPack very useful, instead of worrying about valid XML, you can use APIs similar to XDocument on dirty, real-world HTML.

Peter J 2010-09-17 18:16:51

@Peter: Fortunately I've rarely had to work with dirty HTML - I've found myself using real XML more frequently. I'll bear it in mind though.

Jon Skeet 2010-09-17 18:36:43

Answer 7

+1 A:

For the task in hand - (read existing doc, write, and modify in a formalised way) I'd go with XPathDocument run through an XslCompiledTransform.

Where you can't formalise, don't have pre-existing docs or generally need more adaptive logic, I'd go with LINQ and XDocument like Skeet says.

Basically if the task is transformation then XSLT, if the task is manipulation then LINQ.

annakata 2010-09-17 15:27:30

Answer 8

A:

My favorite tool for this kind of thing is HtmlAgilityPack. I use it to parse complex HTML documents into LINQ-queryable collections. It is an extremely useful tool for querying and parsing HTML (which is often not valid XML).

For your problem, the code would look like:

var htmlDoc = HtmlAgilityPack.LoadDocument(stringOfHtml);
var images = htmlDoc.DocumentNode.SelectNodes("//img[id=lookforthis]");

if(images != null)
{
  foreach (HtmlNode node in images)  
  {  
      node.Attributes.Append("alt", "added an alt to lookforthis images.");  
  }  
}

htmlDoc.Save('output.html');

Peter J 2010-09-17 15:31:09

ansaurus

tags:

views:

answers:

Best way to read, modify, and write XML

related questions