views:

898

answers:

2

I'm writing a chrome extension that works with a website that uses iso-8859-1. Just to give some context what my extension does is making posting in the site's forums quicker by adding a more convenient post form. The value of the textarea where the message is written is then sent through an ajax call (using jQuery).

If the message contains characters like á these characters appear as á in the posted message. Forcing the browser to display utf-8 instead of iso-8859-1 makes the á appear correctly.

It is my understanding that javascript uses utf-8 for it's strings, so it is my theory that if I transcode the string to iso-8859-1 before sending it, it should solve my problem. However there seems to be no direct way to do this transcoding in javascript, and I can't touch the server side code. Any advice?

I've tried setting the created form to use iso-8859-1 like this:

var form = document.createElement("form");
form.enctype = "application/x-www-form-urlencoded; charset=ISO-8859-1";

and also

var form = document.createElement("form");
form.encoding = "ISO-8859-1";

but that doesn't seem to work.

EDIT:

The problem actually lied in how jQuery was urlencoding the message (or something along the way), I fixed this by telling jQuery not to process the data and doing it myself as is shown in the following snippet

function cfaqs_post_message(msg) {
    var url = cfaqs_build_post_url();
    msg = escape(msg).replace(/\+/g, "%2B");

    $.ajax({
        type: "POST",
        url: url,
        processData: false,
        data: "message=" + msg + "&post=Preview Message",
        success: function(html) {
            // ...
        },
        dataType: "html",
        contentType: "application/x-www-form-urlencoded"
    });
}
A: 

Just use this in your script tag:

<script type="text/javascript" src="[path]/myscript.js" charset="utf-8"></script>

You can also configure your webserver to serve all .js files in the UTF-8 charset, or only .js files in a single directory. You can do the latter (in Apache) by adding this line to the .htaccess file in the directory where your scripts are stored:

AddCharset utf-8 .js
Todd Moses
The scripts in question are what in Chrome extensions are called content scripts, they live inside .js files and are not added by any html tag (at least not by me). Modifying the webserver makes no sense since its an extension to Chrome, it's installed on the client machine.And forgive me if I'm just not getting something here, but how would setting the charset of the script itself to utf-8 help in this case? I appreciate the help but I think either I'm not understanding the answer or you are not understanding the question (no offense).
Marcos Marin
Sorry. I must not understand.
Todd Moses
+1  A: 

It is my understanding that javascript uses UTF-8 for it's strings

No, no.

Each page has its charset enconding defined in meta tag, just bellow head element

<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>

or

<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>

Besides that, each page should be edited with the target charset encoding. Otherwise, it will not work as expected.

And it is a good idea define its target charset encoding on server sive

Java
<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>

PHP
header("Content-Type: text/html; charset=UTF-8");

C#
I do not know how to...

And it could be a good idea set up each script file whether it uses sensitive characters (á, é, í, ó, ú and so on...)

<script type="text/javascript" charset="UTF-8" src="/PATH/TO/FILE.js"></script>

...

So it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem

No, no

Target server could handle strings in other than ISO-8859-1. For instance, Tomcat handles in ISO-8859-1, no matter how you set up your page. So, on server side, you could have to set up your request according how your set up your page.

Java
request.setCharacterEncoding("UTF-8")

PHP
// I do not know how to...

If you really want to translate the target charset encoding, TRY as follows

InternetExplorer
    formElement.encoding = "application/x-www-form-urlencoded; charset=ISO-8859-1";
ELSE
    formElement.enctype  = "application/x-www-form-urlencoded; charset=ISO-8859-1";

Or you should provide a function that get the numeric representation, in Unicode Character Set, used by each character. It will work regardless the target charset encoding. For instance, á as Unicode Character Set is \u00E1;

alert("á without its Unicode Character Set numerical representation");

function convertToUnicodeCharacterSet(value) {
    if(value == "á")
        return "\u00E1";
}

alert("á Numerical representation in Unicode Character Set is: " + convertToUnicodeCharacterSet("á"));

Here you can see in action

You can use this link as guideline (See JavaScript escapes)

Added to original answer how i implement JQuery funcionality

var dataArray = $(formElement).serializeArray();

var queryString = "";
for(var i = 0; i < dataArray.length; i++) {
    queryString += "&" + dataArray[i]["name"] + "+" + encodeURIComponent(dataArray[i]["value"]);
}

$.ajax({
    url:"url.htm",
    data:dataString,
    contentType:"application/x-www-form-urlencoded; charset=UTF-8",
    success:function(response) {
        // proccess response
    });
});

It works fine without any headache.

regards,

Arthur Ronald F D Garcia
Thanks for the informative answer, I'm marking it as correct even though this was not exactly the solution. My post didn't really give enough information to show the real issue. (I only found out about that after banging my head against the wall for a few more hours)
Marcos Marin
@Marcos Marin Added content to original answer
Arthur Ronald F D Garcia