tags:

views:

449

answers:

6

Hi Fellow StackOverflowers,

I am receiving a string in one of my .NET function. The string when viewed from the XML Visualizer looks like this:

- <root>
- <Table>
  <ID>ABC-123</ID>
  <CAT>Housekeeping</CAT>
  <DATE>21-JUN-2009</DATE>
  <REP_BY>John</REP_BY>
  <LOCATION>Head Office</LOCATION>
</Table>
- <Table>
  <ID>ABC-124</ID>
  <CAT>Environment</CAT>
  <DATE>23-JUN-2009</DATE>
  <REP_BY>Michelle</REP_BY>
  <LOCATION>Block C</LOCATION>
</Table>
- <Table>
  <ID>ABC-125</ID>
  <CAT>Staging</CAT>
  <DATE>21-JUN-2009</DATE>
  <REP_BY>George</REP_BY>
  <LOCATION>Head Office</LOCATION>
</Table>  
- <Table>
  <ID>ABC-123</ID>
  <CAT>Housekeeping</CAT>
  <DATE>21-JUN-2009</DATE>
  <REP_BY>John</REP_BY>
  <LOCATION space="preserve" xmlns="http://www.w3.org/XML/1998/namespace" /> 
</Table>  
</root>

I need to parse this string so that I could write the data into a datatable whose columns are the xml tags for each data.

In the above text, I would then have a datatable that wil have 5 columns, named ID, CAT, DATE, REP_BY and LOCATION which will then contain 4 rows of data.

In the fourth tag, notice that the does not have any data, but rather it is marked space="preserve". This would mean that the data I am placing in my datatable would be blank for the LOCATION column of the fourth row.

How can I achieve this? Sample codes would be highly appreciated. Thanks.

+1  A: 

Don't use string parsing. Try using some xml library (Linq has some objects that might help you). You will probably do that much more easily.

Samuel Carrijo
Any other method aside from LINQ? I can't use LINQ since I am using .NET 2.0. Thanks.
Batuta
oh, ok. Then Francis might be more helpful in your case. But I'm leaving it in case some .NET 3.5 finds this question ;)
Samuel Carrijo
+6  A: 

Using the XmlReader class. This class is fast and does not use a lot of memory but reading the xml can be difficult.

using (StringReader strReader = new StringReader(yourXMLString))
{
    using (XmlReader reader = XmlReader.Create(strReader))
    {
        while (reader.Read())
        {
            if(reader.Name == "Table" && reader.NodeType == reader.NodeType == XmlNodeType.Element)
            {
                using(XmlReader tableReader = reader.ReadSubtree())
                {
                    ReadTableNode(tableReader);
                }
            }
        }
    }
}

private void ReadTableNode(XmlReader reader)
{
    while (reader.Read())
    {
        if(reader.Name == "ID" && reader.NodeType == reader.NodeType == XmlNodeType.Element)
            //do something
        else if(reader.Name == "CAT" && reader.NodeType == reader.NodeType == XmlNodeType.Element)
            //do something

       //and continue....
    }
}

To get an attribute of the current node you use:

string value = reader.GetAttribute(name_of_attribute);

To get the inner text of an element:

string innerText =  reader.ReadString();

Using the XmlDocument class. This class is slow but manipulating and reading the xml is very easy because the entire xml is loaded.

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(yourXMLString);
//do something

Using the XDocument class. The advantage of using XDocument is elements can be accessed directly and simultaneously. This class also use the power of LINQ to query the xml document.

using(StringReader tr = new StringReader(yourXMLString))
{
    XDocument doc = XDocument.Load(tr);
    //do something
}
Francis B.
How would I then assemble the datatable's column names based on the data from the string?
Batuta
Could you please provide more code example using the actual string data which I posted?I tried your code but nothing comes up in my xmlreader variable.Thanks.
Batuta
Updated the code sample.
Francis B.
I tried using the code you posted but nothing comes up in my string reader and xmlreader variables. both have null values.could you further elaborate? i am not getting anything in this statement: using (StringReader strReader = new StringReader(yourXMLString))
Batuta
Did you replace the "yourXMLString" by your string?
Francis B.
+1  A: 

There are advantages and disadvantages to using one way versus another.

If you're looking for a way to traverse the XML forward, backward and randomly access different elements, then I would use an XmlDocument.

(assuming your XML could be in a string)

XmlDocument doc = new XmlDocument();
doc.LoadXml(xmlVariable);

Then you can then use something like:

XmlNodeList xmlNodes = doc.SelectNodes(xPathString);

Google about XPath and you'll find some great articles about how to get to a certain element in your document and then you can loop through it like this:

foreach(XmlNode node in xmlNodes)
{
   // do something with node.InnerText or any other property/method that you like
   // also through this method you can access the attributes of each node
   // allowing you to process that last LOCATION elements attributes
}

If your xml is in a file somewhere, you can load it via doc.Load(filename) or your choice of a stream object.

Now, if you will only be parsing this once and the XML string is huge, I suggest a StringReader/XmlReader as they are MUCH faster because they're "forward only". That may not be the exact terminology but it cuts down on memory use by not having to store all of a document at once.

CLR
As I have mentioned, I would only parse the string once and needs to get the data into a datatable. Coincidentally, the datatable column fields should be the same as those in the string (XML?) data.Thanks.
Batuta
+1  A: 

This is probably the simplest solution to get the XML into table form. Throwing the attributes out using regular expressions is not that smart (and safe), but I don't like the System.Xml API and LINQ to XML is no option in .NET 2.0.

using System;
using System.Data;
using System.IO;
using System.Text.RegularExpressions;

namespace GeneralTestApplication
{
    class Program
    {
        private static void Main()
        {
            String input = @"<root><Table> [...] </root>";

            input = Regex.Replace(input, @" [a-zA-Z]+=""[^""]*""", String.Empty);

            DataSet dataSet = new DataSet();

            dataSet.ReadXml(new StringReader(input));

            foreach (DataRow row in dataSet.Tables[0].Rows)
            {
                foreach (DataColumn column in dataSet.Tables[0].Columns)
                {
                    Console.Write(row[column] + " | ");
                }
                Console.WriteLine();
            }

            Console.ReadLine();
        }
    }
}

UPDATE

Or get rid of the attribute using System.Xml.

XmlDocument doc = new XmlDocument();

doc.Load(new StringReader(input));

foreach (XmlNode node in doc.SelectNodes("descendant-or-self::*"))
{
    node.Attributes.RemoveAll();
}

input = doc.OuterXml;

But this doesn't work because the XML namespace on the last LOCATION element remains and the DataSet.LoadXml() complains that there connot be two columns named LOCATION.

Daniel Brückner
I don't know why someone would downvote this answer. Using a RegEx to clean the source string? Even I, who hate using string processing on XML with a white-hot passion, might do that in this case.
Robert Rossney
I voted this as the answer since this has helped in particularly solving the case I got.
Batuta
A: 

I believe that you can simply use the ADO.NET DataSet class's ReadXml method to read an XML document in that format, and it will create the DataTable, DataColumn, and DataRow objects for you. You'll need to write a little conversion method if you want to subsequently turn the DATE column's data type to DateTime. But other than that, you shouldn't have to screw around with XML at all.

Edit

I see from Daniel Bruckner's post that the LOCATION elements in the odd namespace pose a problem. Well, that's easy enough to fix:

    XmlDocument d = new XmlDocument();
    d.LoadXml(xml);

    XmlNamespaceManager ns = new XmlNamespaceManager(d.NameTable);
    ns.AddNamespace("n", "http://www.w3.org/XML/1998/namespace");
    foreach (XmlNode n in d.SelectNodes("/root/Table/n:LOCATION", ns))
    {
        XmlElement loc = d.CreateElement("LOCATION");
        n.ParentNode.AppendChild(loc);
        n.ParentNode.RemoveChild(n);
    }

    DataSet ds = new DataSet();
    using (StringReader sr = new StringReader(d.OuterXml))
    {
        ds.ReadXml(sr);
    }
Robert Rossney
A: 

I'm not a huge fan of xml myself, I need to use it as the datasource of a grid to visualize it. I get some output from our FileNet imaging server in xml format and I need to get pieces of it out to populate a database. Here's what I'm doing, HTH:

  Dim dsXML As DataSet
  Dim drXML As DataRow
  Dim rdr As System.IO.StringReader
  Dim docs() As String
  Dim SQL As String
  Dim xml As String
  Dim fnID As String

docs = _fnP8Dev.getDocumentsXML(_credToken, _docObjectStoreName, _docClass, "ReferenceNumber=" & fnID, "")
xml = docs(0)
If (InStr(xml, "<z:row") > 0) Then
 RaiseEvent msg("Inserting images for reference number " & fnID)
 rdr = New System.IO.StringReader(xml)
 dsXML = New DataSet
 dsXML.ReadXml(rdr)

 For Each drXML In dsXML.Tables(dsXML.Tables.Count - 1).Rows
   SQL = "Insert into fnImageP8 values ("
   SQL = SQL & "'" & drXML("Id") & "', "
   Try
    SQL = SQL & "'" & drXML("DocumentTitle") & "', "
   Catch ex As Exception
    SQL = SQL & "null, "
   End Try
Beth