views:

53

answers:

3

I recognized that based on a context in which I want to use some parameters, there are at least 4 kinds of encoding that are necessary to avoid corrupted code being executed :

  1. Javascript encoding when constructing a javascript code, e.g.

    var a = "what's up ?"
    var b = "alert('" + a + "');"
    eval(b); // or anything else that executes b as code
    
  2. URL encoding when using a string as a parameter into the url, e.g.

    var a = "Bonnie & Clyde";
    var b = "mypage.html?par=" + a;
    window.location.href = b; // or anything else that tries to use b as URL
    
  3. HTML encoding when using a string as an HTML source of some element, e.g.

    var a = "<script>alert('hi');</script>";
    b.innerHTML = a; // or anything else that interprets a directly
    
  4. HTML attribute encoding when using a string as a value of an attribute, e.g.

    var a = 'alert("hello")';
    var b = '<img onclick="' + a + '" />'; // or anything else that uses a as a (part of) a tag's attribute
    

While in the ASP.NET codebehind I'm aware of ways to encode the string in all 4 cases (using e.g. DataContractJsonSerializer, HttpUtility.UrlEncode, HttpUtility.HtmlEncode and HttpUtility.HtmlAttributeEncode), it would be quite interesting to know whether there are some utilities that I could use directly from javascript to encode / decode strings in these 4 cases.

+1  A: 

You can use the javascript function escape(..) for some of these purposes.

e: actually forget! sorry, it's a deprecated function - encodeURI(), decodeURI() etc are the way forward! Details here.

escape and unescape functions do not work properly for non-ASCII characters and have been deprecated. In JavaScript 1.5 and later, use encodeURI, decodeURI, encodeURIComponent, and decodeURIComponent.

The escape and unescape functions let you encode and decode strings. The escape function returns the hexadecimal encoding of an argument in the ISO Latin character set. The unescape function returns the ASCII string for the specified hexadecimal encoding value.encoding value.

danp
actually, I believe encodeURIComponent() would cover exactly the case 2.. only 3 cases left :)
Thomas Wanner
+2  A: 

Case 2 can be dealt with using encodeURIComponent(), as danp suggested.

Case 3 won't execute the script in most browsers. If you want the output to the document to be <script>...</script>, you should edit the text content of the element instead:

var a = "<script>alert('hi');</script>";
if ("textContent" in b)
    b.textContent = a; // W3C DOM
else
    b.innerText = a; // Internet Explorer <=8

Cases 1, and 4 aren't really encoding issues, they're sanitation issues. Encoding the strings passed to these functions would probably cause a syntax error or just result in a string value that isn't assigned to anything. Sanitizing usually involves looking for certain patterns and either allowing the action or disallowing it - it's safer to have a whitelist than a blacklist (that sounds terrible!).

Internet Explorer 8 has an interesting function called window.toStaticHTML() that will remove any script content from a HTML string. Very useful for sanitizing HTML before inserting into the DOM. Unfortunately, it's proprietary so you won't find this function in other browsers.

Andy E
Re case 3 I believe the OP's goal is not to avoid JavaScript execution but simply to have the html encoded (see his comment to the question).
Crescent Fresh
@Crescent: Ahh, I see - thanks for the clarification.
Andy E
A: 
Spudley