tags:

views:

41

answers:

4

Hello all,

There are two columns. One of them contains HTML and another contains plain text. How can I compare them as 2 plain texts? Converting HTML -> plain text should be done the same way as a browser does when copying selected HTML into clipboard and pasting it into notepad.

Regards,

+1  A: 

The SQL doesn't know that one is HTML and one is not.

If you just want to compare the precise content, use = or LIKE.

If you want to remove the tags, do precisely that... remove the tags from the HTML column, and then compare the result of that to the SQL column.

zebediah49
But how to remove the tags from SQL query?
noober
(I mean, implement removing in T-SQL).
noober
A: 

When you pull the values from the database they are whatever datatype your field containes. You can manipulate the strings any way you want in your desired programming language.... (they should already be text if that is what they were).

John
A: 

SQL 2008 (and earlier) does not contain any function or code that can "natively" convert HTML into, err, non-HTML. You either need to write such a function yourself, or find a third-party utility that can do this. (Is there application code that does this? Perhaps read the data and run it through that app?)

Philip Kelley
+1  A: 

The answer to this SO question links to a user defined function for stripping HTML tags from text. After doing this you can then compare with the plain text field, e.g.

SELECT * FROM YourTable
WHERE plainText=udt_stripHTML(htmlText)
mdma
Note that the function is totally naïve and won't work for a great many perfectly valid HTML tags. SQL doesn't really give you the parsing features you would need to handle HTML reliably. Also it's only a tag stripper (or a poor attempt at one), it won't fix up entity references.
bobince
Yeah,   will be left unprocessed... But that's nice start point, though.
noober