tags:

views:

990

answers:

6

So I ran into a problem. Wrote the following code snippet:

teksti = teksti.Trim()
teksti = Replace(teksti, "<", "& lt;")
teksti = Replace(teksti, ">", "& gt;")
teksti = Replace(teksti, """", "& quot;")
teksti = Replace(teksti, "'", "& #8217;")
teksti = Replace(teksti, "%", "& #37;")
teksti = Replace(teksti, "&", "& amp;")
teksti = Replace(teksti, "#", "& #35;")
teksti = Replace(teksti, "@", "& #64;")

After writing this I realized it becomes its own problem. The function is supposed to make information safe for HTML and SQL injection (there are other methods too, parameterized queries, etc but that's beside the point). However what happens, is that it first replaces &lt; with & lt; and then proceeds to replace the newly written string again as every replace string has &, # and ; signs in it.

Any hints? I thought about using a regex for this, but I couldn't find any decent Visual Basic examples that were simple enough.

Edit: Thanks for the tips. I was sure there would be a "smart" easy way to do this, but I guess there are no common methods available after all. The re-arranging the problem cases first is the obvious solution here, thanks for that. I guess the work day was too long for me to notice. :D

As for parametirized queries, checking back I see my English doesn't come out as intended. I meant to say that I'm already using them, that this problem here is specific to prevent all manner of html-injection and possible sql-injection using the same strings elsewhere later. Thanks again for the help.

+6  A: 

If this is .NET, you might look at System.Web.HttpServerUtility.HtmlEncode instead.

If you're using VBScript/VB6 just move the ampersand and pound sign up to the top of this list, and don't rely on this function to protect you against sql injection. You still need parameterized queries.

Joel Coehoorn
+3  A: 

If you're using VB.NET, You're looking for System.Web.HttpUtility.HtmlEncode(string).

Otherwise, I would loop through the string one character at a time and build up a new encoded string, replacing as you go. That way, you only need one pass through the string and a case statement for each character, and you're not going to re-encode an encoded character.

lc
It's available in any .Net language in System.Web
Mitch Wheat
Of course. The OP specifically said VB, so I was just qualifying which kind of VB.
lc
+2  A: 

You could reorder to put the problem cases first. Or you could iterate through the string and build a new string by analysing each character in turn and either appending it or it's desired replacement. Otherwise you could use an off the shelf library/function for this, although I'm not versed in this language so couldn't name one.

JeeBee
+1  A: 

Reorder as other people suggested. If you find two cases which conflict with each other and cannot be resolved through reordering, add an additional replacement like this:

teksti = teksti.Trim()
teksti = Replace(teksti, "&", "THISISANAMP")
teksti = Replace(teksti, ";", "& #59;")
teksti = Replace(teksti, "#", "& #35;")
teksti = Replace(teksti, "THISISANAMP", "&amp;") ''newly added
teksti = Replace(teksti, "<", "& lt;")
teksti = Replace(teksti, ">", "& gt;")
teksti = Replace(teksti, """", "& quot;")
teksti = Replace(teksti, "'", "& #8217;")
teksti = Replace(teksti, "%", "& #37;")
teksti = Replace(teksti, "@", "& #64;")

This is the simplest way to alter your code.

Jamie
+2  A: 

Replace the & character first, then the # character. After that the others can safely be replaced.

However, this is not a good method to protect against SQL injection. That should preferrably be done using parameterised queries. There are characters in your code that doesn't need encoding for HTML, if you are encoding them to protect against SQL injections, you are on a dangerous path. It will make SQL injections harder to accomplish, but it's not a safe method.

Also, if you are encoding the text before you put it in the database, you may get problems with it later. It's better to store the text unchanged in the database and take care of the HTML encoding when you display the text.

Guffa
+1  A: 

Like mentioned in previous posts, reordering your "replaces" should provide a quick fix to your specific issue and it is highly advisable for you to look into parameterized queries as well.

Another suggestion is for you to look into the built-in .net libraries for encoding, specifically the Microsoft.Security.Application.AntiXss library which I find it to be better than System.Web.HttpUtility.HtmlEncode because it uses a "whitelist" approach rather than a "blacklist" approach.

You can find more info about it here:

http://blogs.msdn.com/cisg/archive/2008/08/26/what-is-microsoft-antixss.aspx

Hope this helps.

D.

Diego C.