tags:

views:

302

answers:

2

I have a number rather large, complex xml documents that I need to loop through. An xmlns is defined at the top of the document however the url this points to is no longer available.

What's the best way to parse the file to get the important data from it using C#?

I tried to load it into a Dataset but would occasionally receive the errors: The table (endpoint) cannot be the child table to itself in nested relations. or Cannot add a SimpleContent column to a table containing element columns or nested relations.

XPath was my next port of call but I had problems because of the lack of namespace.

I suspect this is seriously limiting my options, but does anyone have any suggestions?

Snippet of the XML document:

<?xml version="1.0" encoding="UTF-8"?>
<cdr:cdr_set xmlns:cdr="http://www.naturalconvergence.com/schema/cdr/v3/cdr"&gt;

<!--  Copyright (c) 2001-2009, all rights reserved  -->

<cdr:cdr xmlns:cdr="http://www.naturalconvergence.com/schema/cdr/v3/cdr"&gt;
  <cdr:call_id>2040-1247062136726-5485131</cdr:call_id>
  <cdr:cdr_id>1</cdr:cdr_id>
  <cdr:status>Normal</cdr:status>
  <cdr:responsibility>
    <cdr:tenant id="17">
      <cdr:name>SpiriTel plc</cdr:name>
    </cdr:tenant>
    <cdr:site id="45">
      <cdr:name>KWS</cdr:name>
      <cdr:time_zone>GB</cdr:time_zone>
    </cdr:site>
  </cdr:responsibility>
  <cdr:originator type="sipGateway">
    <cdr:sipGateway id="3">
      <cdr:name>Audiocodes-91</cdr:name>
    </cdr:sipGateway>
  </cdr:originator>
  <cdr:terminator type="group">
    <cdr:group>
      <cdr:tenant id="17">
        <cdr:name>SpiriTel plc</cdr:name>
      </cdr:tenant>
      <cdr:type>Broadcast</cdr:type>
      <cdr:extension>6024</cdr:extension>
      <cdr:name>OLD PMS DDIS DO NOT USE</cdr:name>
    </cdr:group>
  </cdr:terminator>
  <cdr:initiation>Dialed</cdr:initiation>
  <cdr:calling_number>02087893850</cdr:calling_number>
  <cdr:dialed_number>01942760142</cdr:dialed_number>
  <cdr:target>6024</cdr:target>
  <cdr:direction>Inbound</cdr:direction>
  <cdr:disposition>No Answer</cdr:disposition>
  <cdr:timezone>GB</cdr:timezone>
  <cdr:origination_timestamp>2009-07-08T15:08:56.727+01:00</cdr:origination_timestamp>
  <cdr:release_timestamp>2009-07-08T15:09:26.493+01:00</cdr:release_timestamp>
  <cdr:release_cause>Normal Clearing</cdr:release_cause>
  <cdr:call_duration>PT29S</cdr:call_duration>
  <cdr:redirected>false</cdr:redirected>
  <cdr:conference>false</cdr:conference>
  <cdr:transferred>false</cdr:transferred>
  <cdr:estimated>false</cdr:estimated>
  <cdr:interim>false</cdr:interim>
  <cdr:segments>
    <cdr:segment>
      <cdr:originationTimestamp>2009-07-08T15:08:56.727+01:00</cdr:originationTimestamp>
      <cdr:initiation>Dialed</cdr:initiation>
      <cdr:call_id>2040-1247062136726-5485131</cdr:call_id>
      <cdr:originator type="sipGateway">
        <cdr:sipGateway id="3">
          <cdr:name>Audiocodes-91</cdr:name>
        </cdr:sipGateway>
      </cdr:originator>
      <cdr:termination_attempt>
        <cdr:termination_timestamp>2009-07-08T15:08:56.728+01:00</cdr:termination_timestamp>
        <cdr:terminator type="group">
          <cdr:group>
            <cdr:tenant id="17">
              <cdr:name>SpiriTel plc</cdr:name>
            </cdr:tenant>
            <cdr:type>Broadcast</cdr:type>
            <cdr:extension>6024</cdr:extension>
            <cdr:name>OLD PMS DDIS DO NOT USE</cdr:name>
          </cdr:group>
        </cdr:terminator>
        <cdr:provided_address>01942760142</cdr:provided_address>
        <cdr:direction>Inbound</cdr:direction>
        <cdr:disposition>No Answer</cdr:disposition>
      </cdr:termination_attempt>
    </cdr:segment>
  </cdr:segments>
</cdr:cdr>

...

</cdr:cdr_set>

Each entry is essentially the same but there are sometimes differences such as some of the fields may be missing, if they aren't required.

+5  A: 

These values in an xml file are identifiers, not locators. Unless you are expecting to download a schema, it is not needed at all, and can be "flibble" if needed. I expect the best thing would be to just load it into XmlDocument / XDocument and try to access the data.

For example:

XmlDocument doc = new XmlDocument();
doc.Load("cdr.xml");
XmlNamespaceManager ns = new XmlNamespaceManager(doc.NameTable);
ns.AddNamespace("cdr", "http://www.naturalconvergence.com/schema/cdr/v3/cdr");
XmlElement el = (XmlElement)doc.SelectSingleNode(
    "cdr:cdr_set/cdr:cdr/cdr:originator", ns);
Console.WriteLine(el.GetAttribute("type"));

or to loop over the cdr elements:

    foreach (XmlElement cdr in doc.SelectNodes("/cdr:cdr_set/cdr:cdr", ns))
    {
        Console.WriteLine(cdr.SelectSingleNode("cdr:call_id", ns).InnerText);
    }

Note that the aliases used in the document are largely unrelated to the aliases used in the XmlNamespaceManager, hence you need to re-declare it. I could have used x as my alias in the C# just as easily.


Of course, if you prefer to work with an object model; run it through xsd (where cdr.xml is your example file):

xsd cdr.xml
xsd cdr.xsd /classes

Now you can load it with XmlSerializer.

Marc Gravell
I have tried this and the lineXmlNodeList nodes = root.SelectNodes("/cdr:cdr");generates the exception "Namespace Manager or XsltContext needed. This query has a prefix, variable, or user-defined function."
Anthony
(replied on the question)
Marc Gravell
Code sippet worked, i have no idea what you did differently, I assume my xpath query was incorrect.Thnak you!
Anthony
It is probably the namespace manager; see the "ns" that I passed into the query.
Marc Gravell
How could that code be adapted so I can loop through each cdr:cdr. There could be hundreds in each file.
Anthony
Looping example added.
Marc Gravell
+1  A: 

alternativley load it into an Xdocument and use linq2XML? ... although you might just get the same error.

I don't know what data you want, so its hard to suggest a query. I personally prefer the use of XDocument to xmlDocument now in most cases.

the only problem with the automatic generation of an XSD is that it can get your datatypes pretty badly wrong if you are not using a good sized chunk of sample data.

John Nicholas
Indeed, when I tried it, xsd did a pretty poor job that `XmlSerializer` actually refused to load...
Marc Gravell
I feel your pain ;)I usually end up rewriting auto generated xsd ... but at least its a nice starting point.
John Nicholas