views:

91

answers:

2

I noticed that the jQuery parseJSON basically does a simple regex "check":

parseJSON: function( data ) {
    if ( typeof data !== "string" || !data ) {
        return null;
    }

    // Make sure leading/trailing whitespace is removed (IE can't handle it)
    data = jQuery.trim( data );

    // Make sure the incoming data is actual JSON
    // Logic borrowed from http://json.org/json2.js
    if ( /^[\],:{}\s]*$/.test(data.replace(/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g, "@")
        .replace(/"[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g, "]")
        .replace(/(?:^|:|,)(?:\s*\[)+/g, "")) ) {

        // Try to use the native JSON parser first
        return window.JSON && window.JSON.parse ?
            window.JSON.parse( data ) :
            (new Function("return " + data))();

    } else {
        jQuery.error( "Invalid JSON: " + data );
    }
},

If it passes that "check" and if it's a modern browser a native JSON parser is used. Otherwise, I assume for a browser like IE6 a new function is automatically invoked and returns the object.

Question #1: Since this is just a simple regex test, isn't this prone to some sort of obscure edge-case exploit? Shouldn't we really be using a full blown parser, for the browsers that don't support native JSON parsing at least?

Question #2: How much "safer" is (new Function(" return " + data ))() as opposed to eval("(" + text + ")")?

A: 

The JSON.parse method is the safest. This is defined when you include json2.js from http://www.json.org/js.html and used automatically by parseJSON/getJSON. It parses instead of executing the JSON markup.

spoulson
json2.js will use the same regex, and then `eval`, if native JSON isn't available.
Matthew Flaschen
Noted. Thanks for clarifying.
spoulson
+2  A: 

As mentioned in the comments there jQuery's JSON parser "borrows" the logic that tests to see if the JSON string is valid, right from json2.js. This makes it "as safe" as the most common non-native implementation, which is rather strict anyway:

// In the second stage, we run the text against regular expressions that look
// for non-JSON patterns. We are especially concerned with '()' and 'new'
// because they can cause invocation, and '=' because it can cause mutation.
// But just to be safe, we want to reject all unexpected forms.

// We split the second stage into 4 regexp operations in order to work around
// crippling inefficiencies in IE's and Safari's regexp engines. First we
// replace the JSON backslash pairs with '@' (a non-JSON character). Second, we
// replace all simple value tokens with ']' characters. Third, we delete all
// open brackets that follow a colon or comma or that begin the text. Finally,
// we look to see that the remaining characters are only whitespace or ']' or
// ',' or ':' or '{' or '}'. If that is so, then the text is safe for eval.

            if (/^[\],:{}\s]*$/.
test(text.replace(/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g, '@').
replace(/"[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g, ']').
replace(/(?:^|:|,)(?:\s*\[)+/g, ''))) {

What I don't understand is why jQuery runs the regular expression/replaces before checking for a native implementation which would check for correct JSON grammar anyway. It seems like it would speed things up to only do this if a native implementation isn't available.

Question 2 is answered very well by bobince in another question:

It's not really a big difference, but the feeling is that eval is ‘worse’ than new Function. Not in terms of security — they're both equally useless in the face of untrusted input, but then hopefully your webapp is not returning untrusted JSON strings — but in terms of language-level weirdness, and hence resistance to optimisation.

Check out Nick Craver's answer there too for a direct quote from John Resig.

Andy E