views:

5171

answers:

1

I've got data in SQL Server 2005 that contains html tags and I'd like to strip all that out, leaving just the text between the tags. Ideally also replacing things like &lt; with <, etc.

Is there an easy way to do this or has someone already got some sample sql code?

I don't have the ability to add extended stored procs and the like, so would prefer a pure sql approach (preferably one backwards compatible with sql 2000). I want to retrieve the data with stripped out html, not update it, so ideally it would be written as a function to make for easy reuse.

So for example converting this:

<B>Some useful text</B>&nbsp;
<A onclick="return openInfo(this)" href="http://there.com/3ce984e88d0531bac5349" target=globalhelp>
   <IMG title="Source Description" height=15 alt="Source Description" src="/ri/new_info.gif" width=15 align=top border=0>
</A>&gt;&nbsp;<b>more text</b></TD></TR>

to this:

Some useful text > more text
+7  A: 

There is a UDF that will do that described here:

User Defined Function to Strip HTML

Edit: note this is for SQL Server 2005, but if you change the keyword MAX to something like 4000, it will work in SQL Server 2000 as well.

RedFilter
Great, thanks. Comments there link to an improved version: http://lazycoders.blogspot.com/2007/06/stripping-html-from-text-in-sql-server.html which deals with more html entities.
Rory
Note that as a string-intensive UDF in SQL Server 2005 or later, **this is a perfect candidate for implementing a CLR UDF function** for a massive performance boost. More info on doing so here: http://stackoverflow.com/questions/34509/natural-human-alpha-numeric-sort-in-microsoft-sql-2005/2060952#2060952
RedFilter