I have an automated process that inserts an XML document into SQL Server 2008 table, the column is of Type XML. There is a lot of duplicated data, I wonder if anyone can recommend a good way to delete non-distinct values based on the XML column? The table has thousands of rows and each XML document is about 70k.
Each XML document looks the same except for one element value, for example:
Row 1 , Column C:
<?xml version="1.0"?><a><b/><c>2010.09.28T10:10:00</c></a>
Row 2, Column C:
<?xml version="1.0"?><a><b/><c>2010.09.29T10:10:00</c></a>
I want to pretend that the value of is ignored when it comes to the diff. If everything else is equal, then I want to consider the documents to be the same. If any other element is different, then the documents would be considered different.
Thanks for all ideas.