views:

204

answers:

3

Whenever I use the \u2028 character literal in my javascript source with the content type set to "text/html; charset=utf-8" I get a javascript parse errors.

Example:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd"&gt;

<html lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>json</title>

    <script type="text/javascript" charset="utf-8">
    var string = '
    ';
    </script>
</head>
<body>

</body>
</html>

If the <meta http-equiv> is left out everything works as expected. I've tested this on Safari and Firefox, both exhibit the same problem.

Any ideas on why this is happening and how to properly fix this (without removing the encoding)?

Edit: After some more research, the specific problem was that the problem character was returned using JSONP. This was then interpreted by the browser, which reads u2028 as a newline and throws an error about an invalid newline in a string.

A: 

Well, that makes sense, since you are telling the browser that the HTML and script are both using UTF-8, but then you specify a character that is not UTF-8 encoded. When you specify "charset=UTF-8", you are respoonsible for making sure the bytes transmitted to the browser are actually UTF-8. The web server and and browser will not do it for you in this situation.

Remy Lebeau - TeamB
So, how to solve it? The character was entered by a user of the website. His data is stored in JSON. Every time I request the JSON I get a parse error, because the character is in there. I cannot just delete the character, because it's likely that the client will enter it again.
klaaspieter
According to the comments for [this](http://stackoverflow.com/questions/1811505/with-jquery-access-json-from-cross-domain-url-where-json-may-be-poorly-formed#1811737) answer, this is a valid UTF-8 character which should be correctly parsed.
klaaspieter
+1  A: 

Could you just use \u2028, instead of real character?, because U+2028 is unicode line seperator, browsers would think that as real line break character like \n.

We cannot do like

x = "

"

Right? but we do x = "\n", so might be same concept.

S.Mark
Douglas Crockford's JSON implementation does escape the string, but still throws the parse error. In Safari the native JSON implementation is used, which also throws the parse error.We're loading jsonp so the browser will attempt to parse it before any other javascript has a chance to strip out any invalid characters. I will probably have to solve this server side.
klaaspieter
Yeah @klaaspieter, probably on server side, and If you got to do that, also escape `\u2029` too
S.Mark
By the way, I've tested it some more, Douglas Crockford's implementation is *not* throwing the parse error.
klaaspieter
+1  A: 

Alright,to answer my own question.

Normally a JSON parser strips out these problem characters, because I was retrieving JSONP I wasn't using a JSON parser, in stead the browser tried to parse the JSON itself as soon as the callback was called.

The only way to fix it was to make sure the server never returns these characters when requesting a JSONP resource.

p.s. My question was about u2028, according to Douglas Crockford all of the following characters can cause these problems:

\u0000\u00ad\u0600\u0604\u070f\u17b4\u17b5\u200c\u200f\u2028\u202f\u2060\u206f\ufeff\ufff0-\uffff

klaaspieter