



What path would you took to parse a large XML file (2MB - 20 MB or more), that does not have a schema (I cannot infer one with XSD.exe because the file structure is odd, check the snippet below)?


1) XML Deserialization (but as said, I don't have a schema and XSD tool complains about the file contents), 2) Linq to XML, 3) loading into XmlDocument, 4) Manual parsing with XmlReader & stuff.

This is XML file snippet:

<?xml version="1.0" encoding="utf-8"?>
<xmlData date="29.04.2010 12:09:13">

I would load it into an XmlDocument and then use XPath to process it accordingly. LINQ may be the best bet here, but I am not very familiar with it so I can't say.

Josh Stodola
I read somewhere that loading into XmlDocument could result in high memory consumption but I am not sure about it.
Yes, it will have to load the entire file into memory. But 2-20MB should not be a major concern in this case.
Josh Stodola

Here's the XSD:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs=""&gt;
  <xs:element name="xmlData">
        <xs:element maxOccurs="unbounded" name="Table">
              <xs:element name="ident" type="xs:int" />
              <xs:element name="stock" type="xs:int" />
              <xs:element name="pricewotax" type="xs:double" />
              <xs:element name="discountpercent" type="xs:double" />
              <xs:element minOccurs="0" name="pricebyquantity">
                    <xs:element maxOccurs="unbounded" name="Table">
                          <xs:element name="quantity" type="xs:int" />
                          <xs:element name="pricewotax" type="xs:double" />
                          <xs:element name="discountpercent" type="xs:double" />
      <xs:attribute name="date" type="xs:string" use="required" />

Here's the serializable class:

// <auto-generated>
//     This code was generated by a tool.
//     Runtime Version:2.0.50727.3603
//     Changes to this file may cause incorrect behavior and will be lost if
//     the code is regenerated.
// </auto-generated>

// This source code was auto-generated by xsd, Version=2.0.50727.1432.
namespace StockInfo {
    using System.Xml.Serialization;

    /// <remarks/>
    [System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.1432")]
    [System.Xml.Serialization.XmlRootAttribute(Namespace="", IsNullable=false)]
    public partial class xmlData {

        private xmlDataTable[] tableField;

        private string dateField;

        /// <remarks/>
        public xmlDataTable[] Table {
            get {
                return this.tableField;
            set {
                this.tableField = value;

        /// <remarks/>
        public string date {
            get {
                return this.dateField;
            set {
                this.dateField = value;

    /// <remarks/>
    [System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.1432")]
    public partial class xmlDataTable {

        private int identField;

        private int stockField;

        private double pricewotaxField;

        private double discountpercentField;

        private xmlDataTableTable[] pricebyquantityField;

        /// <remarks/>
        public int ident {
            get {
                return this.identField;
            set {
                this.identField = value;

        /// <remarks/>
        public int stock {
            get {
                return this.stockField;
            set {
                this.stockField = value;

        /// <remarks/>
        public double pricewotax {
            get {
                return this.pricewotaxField;
            set {
                this.pricewotaxField = value;

        /// <remarks/>
        public double discountpercent {
            get {
                return this.discountpercentField;
            set {
                this.discountpercentField = value;

        /// <remarks/>
        [System.Xml.Serialization.XmlArrayItemAttribute("Table", IsNullable=false)]
        public xmlDataTableTable[] pricebyquantity {
            get {
                return this.pricebyquantityField;
            set {
                this.pricebyquantityField = value;

    /// <remarks/>
    [System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.1432")]
    public partial class xmlDataTableTable {

        private int quantityField;

        private double pricewotaxField;

        private double discountpercentField;

        /// <remarks/>
        public int quantity {
            get {
                return this.quantityField;
            set {
                this.quantityField = value;

        /// <remarks/>
        public double pricewotax {
            get {
                return this.pricewotaxField;
            set {
                this.pricewotaxField = value;

        /// <remarks/>
        public double discountpercent {
            get {
                return this.discountpercentField;
            set {
                this.discountpercentField = value;

One caveat: deserializing may not be the most performant way to parse a 20MB file. XmlReader is likely the fastest way to do it, but that means doing things manually.

BTW, I generated the xsd using the XmlSchemaInference class.
Thanks, though I decided to go with Linq to Xml to parse this, so I'm not relying on serialization.