views:

444

answers:

1

I read this already: http://stackoverflow.com/questions/61233/the-best-way-to-shred-xml-data-into-sql-server-database-columns and http://stackoverflow.com/questions/223376/looking-for-a-good-bulk-insert-xml-shredding-example-for-sql-2005.

The differences of why I'm posting is that I'm using BizTalk 2009 and SQL 2008.

I'm receiving a huge XML structure from a vendor using BizTalk. The client has normalized the XML structure into about 30 tables on a MS/SQL Server 2008 database.

Is there any magic solution yet?

Seems like to me these are the options:

1) BizTalk SQL adapter only good for simple flat databases (not a lot of joins and one-to-many relationships).

2) Write a WCF program a) use LINQ and expose the LINQ object b) use traditional XML DOM or SAX parsing and build ADO.NET to store in database

3) Write a complex Stored Proc that uses Open/XML.

4) Store the database temporarily in an SQL/XML Column, then use some other tool to "shred and normalize" the data. Is there anything in SSIS that would do this?

5) Leave the data in an XML column, and use XML indices and never normalize it. Embed the ugly XQuery/Xpath statements in a view. Not sure if response time or queries would be adequate. Might take as long to generate the xqueries and views as it would to do one of the other steps above.

I'm guessing that #2 or #3 would take at least one or two hours per table, thus if we have 30 tables, at least 30 (if not 60 hours) of various tedious boring and error-prone work.

Thanks,

Neal Walters

Update 12/23: Some sample data:

 <ns0:ValAgg xmlns:va="http://msbinfo.com/expresslync/rct/valuation" xmlns:ns0="http://TFBIC.RCT.BizTalk.Orchestrations.ValAgg"&gt;
- <MainStreetValuation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://msbinfo.com/expresslync/rct/valuation"&gt;
<ValuationIdentifier>
  <RecordId>1928876</RecordId> 
  <PolicyNumber>ESTIMATE-1928876</PolicyNumber> 
  <VersionId>6773220</VersionId> 
  </ValuationIdentifier>
  <RecordType>EST</RecordType> 
  <PolicyStatus>Complete</PolicyStatus> 
  <DataSource>WEB</DataSource> 
   <bunch more here/> 
<valuationAggregateFlat xmlns="http://tempuri.org/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"&gt;
  <policyNumber>ESTIMATE-1928876</policyNumber> 
  <recordId>1928876</recordId> 
  <versionId>6773220</versionId> 
  <updateTimeStamp>2009-12-14T14:50:30.743</updateTimeStamp> 
  <replacementCost>166129</replacementCost> 
  <yearBuilt>1999</yearBuilt> 
  <totalLivingAreaSqFt>2000</totalLivingAreaSqFt> 
  <primaryRCTRoofTypeCode>15012</primaryRCTRoofTypeCode> 
  <TOPSRoofType>COPR</TOPSRoofType> 
  <StdFireRoofType>COPR</StdFireRoofType> 
  <primaryRTCConstructionTypeCode>10016</primaryRTCConstructionTypeCode> 
  <constructionType>BV</constructionType> 
  <hailProofIndicator>false</hailProofIndicator> 
  <anyWoodRoofIndicator>false</anyWoodRoofIndicator> 
  <allMetalRoofIndicator>true</allMetalRoofIndicator> 
  </valuationAggregateFlat>
</ns0:ValAgg>

Where you see "MainStreetValuation" could also be a couple of other complex types, such as "HighValueValuation" where the entire structure is different for homes that have fancy stuff.

+2  A: 

Quick note: the fact that you're using BizTalk 2009 does not, by itself, prevent you from also using something like SSIS for shredding and otherwise processing the XML.


The following is too long for a comment:

There's an issue to be aware of with the XML Source. Consider an XML structure like:

<root>
    <parent attr1="value1" attr2="value2">
        <child attrc1="valuec1" attrc2="valuec2"/>
        <child attrc1="valuec1" attrc2="valuec2"/>
    </parent>
    <parent> ... </parent>
    ...
</root>

The result of processing this through the XML Source will be two outputs: one with attr1 and attr2, and another with attrc1 and attrc2. The outputs are all processed asynchronously with respect to each other. You'll need to corollate the parent and child rows by means of an artificial column that SSIS will introduce. Each parent will have an id column, and each child will have the same id value as a "foreign key". You may need to do a little work in your database to match the two.

John Saunders
This is exactly what SSIS is good for!
HLGEM
I wouldn't go as far as saying "exactly", given that SSIS and XML have some impedance mismatches, but yeah, I'd strongly consider the use of SSIS.
John Saunders
Yea, I just wanted you to know our environment. What in SSIS should we look at further. I'm the BizTalk guy, partially advising on SQL (I used DTS years back, but never SSIS). What are the magic words or function to look for in SSIS to help demonstrate how it is done. Does SSIS in SQL/2008 vs 2005 have anything new in this regard? Thanks!
NealWalters
First, forget that you ever knew DTS. DTS was a hack to accomplish what SSIS is designed to accomplish. Second, look at the "XML Source" transform. It will turn an XML structure into a set of "table" structures, which can be manipulated using the remainder of SSIS. 2008 adds C# as a script language, adds a true UPSERT capabality, much more efficient lookups in bulk cases, etc.
John Saunders
The DBA tried the SSIS. She keeps getting errors, and some googling has shown several posts that say "Current SSIS support for loading XML does not support Complex Types with mixed content models." Is this true? http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/4928db1d-5418-44e6-87e8-1162d9bcfca5
NealWalters
Mixed content mode means a mixture of text and elements. SSIS does not currently support that, but is that the problem you're having? `<element>text<element2/>more text</element>`?
John Saunders