views:

2599

answers:

2

I was given an .xml file that I needed to read into my code as a DataSet (as background, the file was created by creating a DataSet in C# and calling dataSet.WriteXml(file, XmlWriteMode.IgnoreSchema), but this was done by someone else). The .xml file was shaped like this:

 <?xml version="1.0" standalone="yes"?>
 <NewDataSet>
  <Foo>
    <Bar>abcd</Bar>
    <Foo>efg</Foo>
  </Foo>
  <Foo>
    <Bar>hijk</Bar>
    <Foo>lmn</Foo>
  </Foo>
</NewDataSet>

Using C# and .NET 2.0, I read the file in using the code below.

        DataSet ds = new DataSet();
        ds.ReadXml(file);

Using a breakpoint, after this line ds.Tables[0] looked like this (using dashes in place of underscores that I couldn't get to format properly):

Bar     Foo-Id    Foo-Id-0
abcd     0         null
null     1         0
hijk     2         null
null     3         2

I have found a workaround (I know there are many) and have been able to successfully read in the .xml, but what I would like to understand why ds.ReadXml(file) performed in this manner, so I will be able to avoid the issue in the future. Thanks.

A: 

These are my observations rather than a full answer:

My guess (without trying to re-produce it myself) is that a couple of things may be happening as the DataSet tries to 'flatten' a hierarchical structure to a relational data structure.

1) thinking about the data from a relational database perspective; there is no obvious primary key field for identifying each of the Foo elements in the collection so the DataSet has automatically used the ordinal position in the file as an auto-generated field called Foo-Id.

2) There are actually two elements called 'Foo' so that probably explains the generation of a strange name for the column 'Foo-Id-0' (it has auto-generated a unique name for the column - I guess you could think of this as a fault-tolerant behaviour in the DataSet).

rohancragg
+4  A: 

This appears to be correct for your nested Foo tags:

<NewDataSet>  
  <Foo>              <!-- Foo-Id: 0 -->
    <Bar>abcd</Bar>
    <Foo>efg</Foo>   <!-- Foo-Id: 1, Parent-Id: 0 -->
  </Foo>
  <Foo>              <!-- Foo-Id: 2 -->
    <Bar>hijk</Bar>
    <Foo>lmn</Foo>   <!-- Foo-Id: 3, Parent-Id: 2 -->
  </Foo>
</NewDataSet>

So this correctly becomes 4 records in your result, with a parent-child key of "Foo-Id-0"

Try:

<NewDataSet>  
  <Rec>              <!-- Rec-Id: 0 -->
    <Bar>abcd</Bar>
    <Foo>efg</Foo>   
  </Rec>
  <Rec>              <!-- Rec-Id: 1 -->
    <Bar>hijk</Bar>
    <Foo>lmn</Foo>   
  </Rec>
</NewDataSet>

Which should result in:

Bar     Foo        Rec-Id
abcd    efg        0
hijk    lmn        1
Keith