views:

35

answers:

2

So I got a varbinary(max) column in SQL Server 2005 and it's full of XML. A few records somewhere have truncated XML so they're invalid.

This means if I run a

SELECT CAST(myVarbinaryColumn as XML) ... 

it blows chunks.

How can I filter out/skip invalid xml ?

When I've done similar with a varchar that supposedly has dates I could use ISDATE(blah) = 1. So an equivalent ISVALIDXML() would be nice.

Please don't comment about "Why isn't the column XML datatype anyway.." This happened in the past and I don't have a time machine.

+1  A: 

I wish I had that IsValidXML() function, too..... unfortunately, I don't think there's anything like that.....

Just a thought: is there anything you could check for to filter out the invalid XML??

E.g. could you filter out all those strings that don't end in .....</data> or something like that?? (seeing that you say your invalid XML is typically truncated XML, so I would think the closing tag - </data> or whatever - would be missing in those cases).

marc_s
+1  A: 

I think your best bet would be to write a custom CLR function, perhaps using XmlDocument.Load. In the CLR you could trap the error on a failed load and return an appropriate result.

EDIT: The code below would also work although it's not as elegant as a UDF. Unfortunately, we can't use TRY/CATCH in a UDF.

create procedure dbo.usp_IsValidXML(@XMLCandidate varbinary(max), @Return bit output)
as
begin
    declare @x xml
    begin try
        set @x = cast(@XMLCandidate as xml)
        set @Return = 1
    end try
    begin catch
        set @Return = 0
    end catch
end
go

declare @test1 varbinary(max)
declare @test2 varbinary(max)
set @test1 = cast('<data>asdf</data>' as varbinary(max))
set @test2 = cast('<data>asdf</da' as varbinary(max))

declare @IsValid bit
exec dbo.usp_IsValidXML @test1, @IsValid output
select @IsValid
exec dbo.usp_IsValidXML @test2, @IsValid output
select @IsValid

drop procedure dbo.usp_IsValidXML
Joe Stefanelli
Yes, and create a persistent calculated column based on it, so it's not called every time you go `where valid_xml = 1`.
GSerg
Will it be deterministic?
Denis Valeev