tags:

views:

3907

answers:

5

Is there any function to encode HTML strings in T-SQL? I have a legacy database which contains dodgey characters such as '<', '>' etc. I can write a function to replace the characters but is there a better way?

I have an ASP.Net application and when it returns a string it contains characters which cause an error. The ASP.Net application is reading the data from a database table. It does not write to the table itself.

+9  A: 

You shouldn't fix the string in SQL. A better way is to use a function in ASP.net called HtmlEncode, this will cook the special characters that cause the issues you're seeing see the example below. I hope this helps.

string htmlEncodedStr = System.Web.HttpUtility.HtmlEncode(yourRawStringVariableHere);
string decodedRawStr =  System.Web.HttpUtility.HtmlDecode(htmlEncodedStr);

Edit: Since you're data binding this from a datatable. Use an inline expression to call HTMLEncode in the markup of the GridView or whatever control your using and this will still satisfy your data binding requirement. See example below. Alternativly you can loop every record in the data table object and update each cell with the html encoded string prior to data binding.

<%# System.Web.HttpUtility.HtmlEncode(Eval("YourColumnNameHere")) %>
James
Thanks but unfortunatly the data is returned in a datatable which is then assigned to a datasource. I suppose I could edit the returned datatable row-by-row but is there a better way
Leo Moore
See my last edit.
James
You can also use a BoundField. http://msdn.microsoft.com/en-us/library/system.web.ui.webcontrols.boundfield.aspx
bobince
Yes that is true.
James
+1  A: 

If you're displaying a string on the web, you can encode it with Server.HTMLEncode().

If you're storing a string in the database, make sure the database field is "nchar", instead of "char". That will allow it to store unicode strings.

If you can't control the database, you can "flatten" the string to ASCII with Encoding.ASCII.GetString.

Andomar
+4  A: 

I don't think data in a database should know or care about the user interface. Display issues should be handled by the presentation layer. I wouldn't want to see any HTML mingled into the database.

duffymo
I agree completely, but its not my choice. Its a legacy app with HTML type characters in the Guid (or what passes as the Guid).
Leo Moore
Presentation in the primary key? OMG. I'd refactor that as quickly as possible.
duffymo
A: 

OK here is what I did. I created a simple function to handle it. Its far from complete but at least handles the standard <>& characters. I'll just add to it as I go along.

CREATE FUNCTION HtmlEncode
(
    @UnEncoded as varchar(500)
)
RETURNS varchar(500)
AS
BEGIN
    -- Declare the return variable here
    DECLARE @Encoded as varchar(500)

    -- Add the T-SQL statements to compute the return value here
    SELECT @Encoded = Replace(@UnEncoded,'<','&lt;')
    SELECT @Encoded = Replace(@Encoded,'>','&gt;')
    SELECT @Encoded = Replace(@Encoded,'&','&amp;')

    -- Return the result of the function
    RETURN @Encoded

END
GO

I can then use:

Select Ref,dbo.HtmlEncode(RecID) from Customers

This gives me a HTML safe Record ID. There is probably a built in function but I can't find it.

Leo Moore
+6  A: 

We have a legacy system that uses a trigger and dbmail to send HTML encoded email when a table is entered, so we require encoding within the email generation. I noticed that Leo's version has a slight bug that encodes the & in &lt; and &gt; I use this version:

CREATE FUNCTION HtmlEncode
(
    @UnEncoded as varchar(500)
)
RETURNS varchar(500)
AS
BEGIN
  DECLARE @Encoded as varchar(500)

  --order is important here. Replace the amp first, then the lt and gt. 
  --otherwise the &lt will become &amp;lt; 
  SELECT @Encoded = 
  Replace(
    Replace(
      Replace(@UnEncoded,'&','&amp;'),
    '<', '&lt;'),
  '>', '&gt;')

  RETURN @Encoded
END
GO
Beniaminus
Thanks, you are correct. I did chnage it in production but forgot to update the previous post.
Leo Moore