views:

23

answers:

3

I have a XML file that I need to populate multiple SQL tables, and I was wondering what the best way to do that is. I was thinking dataset, or xslt but I honestly not sure. Here is my generated XML (part of it)

 <?xml version="1.0" encoding="utf-8" standalone="yes" ?> 
- <!-- Created: 8/3/2010 12:09:15 PM
  --> 
- <Trip>
- <TripDetails>
  <DepartureDate /> 
  <ReturnDate /> 
  <TripTypeA>3</TripTypeA> 
  <TripTypeB>1</TripTypeB> 
  <PurposeOfTrip>vacation</PurposeOfTrip> 
  <Region>5</Region> 
- <Countries>
  <Country>105</Country> 
  <Country>135</Country> 
  </Countries>
- <Cities>
  <City>Cancun</City> 
  <City>Tokyo</City> 
  <City>Mayo</City> 
  </Cities>
  <OverallRating>4</OverallRating> 
  <Suppliers>53</Suppliers> 
  <SuppliersComments>Good flight</SuppliersComments> 
- <Transport>
  <TransportType>1</TransportType> 
  <TransportType>3</TransportType> 
  </Transport>
  <TransportComment>Transportation was fast</TransportComment> 

I have a couple different tables I need populating.(keeping it short for example)

TripDetails (TripID, TripTypeA, TripTypeB, SupplierID, overallRating)
TripCountries (TripCountryID, TripID, CountryCode)

I have a bunch more tables(cities, transport) but if I can figure out how to update TripDetails(the main table) and TripCountries (which is a table that brings together TripDetails, and Countries) I think I will be good, thanks!

+1  A: 

Assuming you're using SQL Server, you should parse the XML into DataTables and use the SqlBulkCopy object to shoot them into the database super-fast. There are lots of resources to help you learn about SqlBulkCopy. Here's a recent discussion from another StackOverflow question to get you started: http://stackoverflow.com/questions/3358797/sql-server-2008-tuning-with-large-transactions-700k-rows-transaction/3358867#3358867

If the XML file is really large, you should be careful what sort of parser you use. XDocument and XmlDocument load the whole thing into memory. If the files are small enough, say under 10MB, you should be fine using those parsers.


EDIT:

Here's a quick mock-up of how you could get the XML into DataTables. It's in VB since VB makes XML a tad easier.

Option Strict On : Option Explicit On : Option Infer On : Option Compare Binary

Imports System.Data
Imports System.Linq
Imports System.Xml.Linq

Module Module1

    Sub Main()
      Dim xml =
         <Trip>
            <TripDetails id="1">
               <DepartureDate/> 
               <ReturnDate/> 
               <TripTypeA>3</TripTypeA> 
               <TripTypeB>1</TripTypeB> 
               <PurposeOfTrip>vacation</PurposeOfTrip> 
               <Region>5</Region> 
               <Countries>
                  <Country>105</Country> 
                  <Country>135</Country> 
               </Countries>
               <Cities>
                  <City>Cancun</City> 
                  <City>Tokyo</City> 
                  <City>Mayo</City> 
               </Cities>
               <OverallRating>4</OverallRating> 
               <Suppliers>53</Suppliers> 
               <SuppliersComments>Good flight</SuppliersComments> 
               <Transport>
                  <TransportType>1</TransportType> 
                  <TransportType>3</TransportType> 
               </Transport>
               <TransportComment>Transportation was fast</TransportComment>
            </TripDetails>
         </Trip>

         Dim dtTripDetails As New DataTable()
         With dtTripDetails.Columns
            .Add("TripID", GetType(Integer))
            .Add("TripTypeA", GetType(Integer))
            .Add("DepartureDate", GetType(DateTime))
            .Add("TransportComment", GetType(String))
         End With

         Dim dtTripDetailXrefCountries As New DataTable()
         With dtTripDetailXrefCountries.Columns
            .Add("TripID", GetType(Integer))
            .Add("CountryID", GetType(Integer))
         End With

         Dim xdetails = From td In xml.Descendants("TripDetails") Select td
         For Each xdetailRecord As XElement In xdetails
            Dim tripID As Integer = CInt(xdetailRecord.Attribute("id").Value)
            Dim tripTypeA As Integer = CInt(xdetailRecord.Element("TripTypeA").Value)
            Dim strDepDate As String = xdetailRecord.Element("DepartureDate").Value
            Dim depDate As Object = If(String.IsNullOrEmpty(strDepDate), CType(DBNull.Value, Object), CType(DateTime.Parse(strDepDate), Object))
            Dim transportComment As String = xdetailRecord.Element("TransportComment").Value
            dtTripDetails.Rows.Add(tripID, tripTypeA, depDate, transportComment)

            Dim xcountries = From c In xdetailRecord.Element("Countries").Elements("Country") Select c
            For Each xcountryRecord As XElement In xcountries
               Dim countryID As Integer = CInt(xcountryRecord.Value)
               dtTripDetailXrefCountries.Rows.Add(tripID, countryID)
            Next
         Next

         Console.WriteLine("Done")
         Console.ReadKey(True)

    End Sub

End Module

BTW - when doing this kind of ETL, it's best to pump your data into staging tables first rather than directly into your production tables. That way, you can validate data types and ensure referential integrity and handle key management and get everything perfectly situated without locking up or polluting your production tables.

mattmc3
thank you for the quick reply, I will look into that. The file I am creating is less than 2mb (otherwise what should I use? just curious) and I am using SQL server 2008. So for each table in my SQL Server I should create a new DataTable, then insert with SqlBulkCopy?
Spooks
still a bit confused on to how to add only certain sections to datatables/datasets
Spooks
1 DataTable maps to 1 SQL table, so yes - you'll need a 1-for-1 mapping. You mention being confused about different sections - I assume you're asking about how to get at different data within your XML file. If you're using the old XmlDocument, you'll probably want XPath. But, if you can use the 3.5 framework, I recommend using XDocument, and then you can query your XML file with Linq and get the data elements you need. So, you'll start with looping through the TripDetails node and filling your TripDetails DataTable, and then you'll select subnodes and fill those tables.
mattmc3
Thank you so much, I will try my best to translate this to C#, thanks a million
Spooks
A: 

You could use SQL Server 2005's XQuery capabilities pretty easily, e.g. have a stored proc that accepts on of those XML strings as input:

CREATE PROCEDURE dbo.StoreTripDetails(@XmlContent XML)
AS BEGIN
   DECLARE @NewTripID INT

   INSERT INTO dbo.TripDetails(TripTypeA, TripTypeB, SupplierID, overallRating)
       SELECT
        Trip.Details.value('(TripTypeA)[1]', 'int') 'TripTypeA',
        Trip.Details.value('(TripTypeB)[1]', 'int') 'TripTypeB',
        Trip.Details.value('(Suppliers)[1]', 'int') 'SupplierID',
        Trip.Details.value('(OverallRating)[1]', 'int') 'OverallRating'
       FROM
        @XmlContent.nodes('/Trip/TripDetails') AS Trip(Details)

    SELECT @NewTripID = SCOPE_IDENTITY()

    INSERT INTO dbo.TripCountries(TripID, CountryCode)
        SELECT
        @NewTripID, Trip.Countries.value('(.)[1]', 'int') 
        FROM
        @XmlContent.nodes('/Trip/TripDetails/Countries/Country') AS Trip(Countries)
 END

and then you could call this stored proc from your C# code quite easily and pass in a chunk of XML representing one trip.

marc_s
A: 

You need to specify which database engine you're using. Since you've specified the Linq tag, I assume you're using Microsoft SQL server.

You can easily pass the document as a string to a stored procedure. In this stored procedure, you define some temp table schemas and use sp_xml_preparedocument, OPENXML (and related functions) to populate these temp tables with the data from the XML document. Now you can just use standard SQL to insert into your permanent tables from the temp tables using any projections, joins or other transformations you need. Memory is also no longer an issue most of the time, since it is all done inside MSSQL Server.

More info: http://msdn.microsoft.com/en-us/library/aa260385(SQL.80).aspx http://msdn.microsoft.com/en-us/library/aa276847(SQL.80).aspx

Example (untested):

declare @docHandle int
declare @xmlContent varchar(500)
set @doc ='<root><items><item value="1"/><item value="2"/></item></root>'

exec sp_xml_preparedocument @docHandle OUTPUT, @xmlContent

insert into myTable ([ItemValue])
select value from OPENXML(@docHandle, '/root/items/item')

exec sp_xml_removedocument @docHandle 

Note that these are SQL 2000 conventions. In SQL 2005, there are equivalent T-SQL conventions which are more readable than these stored procedure predecessors. However, these conventions are still available in SQL 2005 and SQL 2008, I believe.

Phil Gilmore