I need to convert data in plain text columns into xml columns. However, not all of the content is valid XML, so I need to ignore any rows with invalid XML. Is there a way I can filter out the invalid data either using some sort of inline conditional convert or something in the WHERE clause?
It sounds like you need a version of the ISNUMERIC function for XML data.
Regrettably no such built-in function exists - so you'll have to devise your own alternative, for which there are a couple of options:
If this is a one-off or small scale process for which performance isn't critical, you could process the input tables one row at a time inside a cursor, using a TRY...CATCH
block to handle invalid casts (untested):
DECLARE xmlCur CURSOR FOR
SELECT textcol
FROM inputTable
OPEN xmlCur
DECLARE @string nvarchar(MAX)
DECLARE @xml xml
FETCH NEXT FROM xmlCur into @string
WHILE @@fetch_status = 0
BEGIN
BEGIN TRY
SET @xml = CAST(@string AS XML)
-- Do something with XML
END TRY
BEGIN CATCH
-- log failure/mark source row as invalid
END CATCH
FETCH NEXT FROM xmlCur into @string
END
CLOSE xmlCur
DEALLOCATE xmlCur
Alternatively, if you're comfortable with .Net programming (and it's enabled on you server) you could use the CLR to create your own IsXML function. The .Net code would need to be not much more complex than the third post on this thread.
The performance of the CLR solution may not be that much better than the cursor - you'd need to test to establish this.
(An obvious thing to try, which doesn't work, is a scalar-valued T-SQL function which attempts to cast the field to XML inside a TRY...CATCH
block. However, TRY...CATCH
is not permitted inside a function.)