views:

101

answers:

1

I need to convert data in plain text columns into xml columns. However, not all of the content is valid XML, so I need to ignore any rows with invalid XML. Is there a way I can filter out the invalid data either using some sort of inline conditional convert or something in the WHERE clause?

+1  A: 

It sounds like you need a version of the ISNUMERIC function for XML data.

Regrettably no such built-in function exists - so you'll have to devise your own alternative, for which there are a couple of options:

If this is a one-off or small scale process for which performance isn't critical, you could process the input tables one row at a time inside a cursor, using a TRY...CATCH block to handle invalid casts (untested):

DECLARE xmlCur CURSOR FOR
SELECT textcol 
FROM inputTable

OPEN xmlCur
DECLARE @string nvarchar(MAX)
DECLARE @xml xml

FETCH NEXT FROM xmlCur into @string

WHILE @@fetch_status = 0
BEGIN
    BEGIN TRY
        SET @xml = CAST(@string AS XML)
        -- Do something with XML
    END TRY
    BEGIN CATCH
        -- log failure/mark source row as invalid
    END CATCH

    FETCH NEXT FROM xmlCur into @string
END

CLOSE xmlCur
DEALLOCATE xmlCur

Alternatively, if you're comfortable with .Net programming (and it's enabled on you server) you could use the CLR to create your own IsXML function. The .Net code would need to be not much more complex than the third post on this thread.

The performance of the CLR solution may not be that much better than the cursor - you'd need to test to establish this.

(An obvious thing to try, which doesn't work, is a scalar-valued T-SQL function which attempts to cast the field to XML inside a TRY...CATCH block. However, TRY...CATCH is not permitted inside a function.)

Ed Harper
The cursor solution is what I had already done, so kudos :D
Daniel Schaffer