I have at least 100 xml files each about 300 MB with email messages basically in the format listed below.
Now my question is, how do I get this data into say SQL Sever database so that I can perform query on this data. My queries would be along the lines of: Has a certain person sent an email to another certain person on a given period with certain keywords on subject/body etc.
Here is what I have tried:
1) Loading each XML file into XML data type field into SQL Server. With this approach I could not come up with the Xpath(?) queries to do what I need. Is it even possible to do this in Xpath?
2) Loading each file into .NET DataSet using ReadXML and ReadSchema. This seems to Load fine and it seems to create the right number of DataTable with the foreign keys etc but this would mean I will have to create 100 sets of table on the database. Somehow join all into one single table then perform the query.
Let me know if you guys have any other suggestions.
Thanks.
<Message>
<MsgID>4651286700000CAA00EF00010000</MsgID>
<MsgTime>2007-05-21-01.04.39.000000</MsgTime>
<MsgTimeUTC>1179723879</MsgTimeUTC>
<MsgLang>CODE 1252</MsgLang>
<Sender>
<UserInfo>
<FirstName>X</FirstName>
<LastName>Y</LastName>
<AccountName>121212</AccountName>
<CorporateEmailAddress>[email protected]</CorporateEmailAddress>
</UserInfo>
</Sender>
<Recipient DeliveryType = " ">
<UserInfo>
<FirstName>A</FirstName>
<LastName>B</LastName>
<FirmNumber>7593</FirmNumber>
<AccountName>STRATEGIC AS</AccountName>
<AccountNumber>604806</AccountNumber>
<CorporateEmailAddress>[email protected]</CorporateEmailAddress>
</UserInfo>
</Recipient>
<Subject>
Please review the following
</Subject>
<Attachment>
<FileName>37715772.htm</FileName>
<FileID>503242486522279_37715772.htm</FileID>
<FileSize>31175</FileSize>
</Attachment>
<MsgBody>
This is the message Body
</MsgBody>