ansaurus

Question

How to send parameters with same encoding from javascript?

Answer 1

+2 A:

Do you specify the encoding of the JavaScript file in the HTTP headers? Like Content-type: text/javascript; charset=utf-8 with the .js file beign saved in UTF-8 of course. With Apache, you can configure

AddCharset utf-8 .js

Or you can make the hosted javascript file create another script tag with a charset='utf-8' parameter and add-it to the head element (like most bookmarklets do).

I think the javascript being interpreted as UTF-8 code should then get/manipulate UTF-8 strings.

Then, in your Java Servlet, you can specify the input encoding to use:

request.setCharacterEncoding("UTF-8");

Edit: check this page about Character Encoding in JavaScript, especially the part named "Setting the Character Encoding".

streetpc 2010-04-12 14:26:41

Answer 2

+3 A:

what I figured out is every client sends the same string with different encodings

Whilst that would be normal for <form> submissions, it should not happen for XMLHttpRequest work. The encodeURIComponent function explicitly always writes URL-encoded UTF-8 bytes, regardless of the encoding of the page from which it was used. Of course persuading your servlet container to allow you to read those UTF-8 bytes without messing them up is another story, but that shouldn't depend on the client.

What might be a problem is if you are using raw non-ASCII characters inside your script file itself. In that case the interpretation of those characters will vary according to the charset the browser is using to load the script. This may be affected by:

any charset declared in the Content-Type: text/javascript;charset= header.
any charset attribute declared on the <script src="..." charset="..."> element.
the charset of the page that included the script.

(1) and (2) are not supported in all browsers. Normally you can rely on (3), but as a third-party script author that is out of your control. Therefore you should use only ASCII characters in your script. (Use \u1234 escapes to include non-ASCII characters in string literals in your script to get around this limitation.)

bobince 2010-04-12 14:28:54

I am using non-ASCII characters, that is why I am having problems.

nimcap 2010-04-12 14:30:28

You are using literal, raw non-ASCII characters in your returned `.js`? If so, you will need to encode them so they fit in only ASCII. For string literals that's easy, as above. (I can't think of a reason you'd need non-ASCII characters outside of string literals.)

bobince 2010-04-12 14:32:02

I updated my question to be more clear, I am using non-ASCII chars but not directly in JS. I fetch them from the page, they usually contain non-ASCII chars.

nimcap 2010-04-12 14:50:03

When contained in an HTML document, characters are already Unicode. If they are appearing correctly on the user's browser, they will definitely also come through `encodeURIComponent` correctly. If the words don't appear right in the user's browser, there's little you can do to recover them.

bobince 2010-04-12 15:17:16

+1 nice one Bob. FWIW, the fact that `encodeURIComponent` specifically creates UTF-8 sequences of bytes is covered by section 15.1.9 of the spec (both 3rd and 5th editions). http://www.ecma-international.org/publications/standards/Ecma-262.htm

T.J. Crowder 2010-04-13 11:54:20

ansaurus

tags:

views:

answers:

How to send parameters with same encoding from javascript?

related questions