views:

124

answers:

5

I have a servlet that outputs JSON. The output encoding for the servlet is ISO-8859-1. Pages in our webapp are also set to ISO-8859-1. I would use UTF-8, but this is outside my control; we have to use ISO-8859-1.

When I hit the servlet by itself, I can see JSON data that has been outputted. The character encoding is correct, and none of the characters look strange.

However, when I call the servlet via AJAX and use the data retrieved to populate a select box, I get � in the place of (it seems) all characters that have accents (for example i with grave or acute accent, dieresis, or circumflex). When I look at the response in the Net tab under Firebug, I can see that that the text looks fine. However, when I use that data to populate the select box, I get the diamond-with-questionmark.

These characters are all valid ISO-8859-1 characters, and so I don't understand why they don't show up correctly.

EDIT

Some more information. I use GET in jQuery.ajax and I've set scriptCharset to ISO-8859-1. On the server-side, I've explicitly set the encoding to ISO-8859-1 using request.setCharacterEncoding("ISO-8859-1");

EDIT

Code samples:

This is what I have currently. I added scriptCharset: "ISO-8859-1" to no effect.

        jQuery.ajax({
            url: "/countryAndProvinceCodeServlet",
            data: data,
            dataType: "json",
            type: "GET",
            success: function(data) {
               ...
            },
        });

My servlet uses org.json.JSONObject and simply outputs the string by doing response.getWriter().print(jsonObject.toString());

UPDATE

Per the comments about JSON and how it should be UTF-8, I tried to see if I could grab the data as text (so set dataType to text in jQuery.ajax) and then evaluate it as JSON myself (in Javascript). That doesn't seem to work either! When I do console.log, I still get the funky diamonds. However, when I look at it under the Net tab in Firebug everything shows up fine:

Net tab:

{"error":false,
 "provinces":{"DZ-01":"Adrar",
              "DZ-16":"Alger",
              "DZ-23":"Annaba",
              "DZ-44":"Aïn Defla",
              "DZ-46":"Aïn Témouchent",
              "DZ-05":"Batna",
              "DZ-07":"Biskra",
              "DZ-09":"Blida",
              "DZ-34":"Bordj Bou Arréridj",
              "DZ-10":"Bouira",
              "DZ-35":"Boumerdès",
              "DZ-08":"Béchar",
              "DZ-06":"Béjaïa",
              "DZ-02":"Chlef",
              "DZ-25":"Constantine",
              "DZ-17":"Djelfa",
              "DZ-32":"El Bayadh",
              "DZ-39":"El Oued",
              "DZ-36":"El Tarf",
              "DZ-47":"Ghardaïa",
              "DZ-24":"Guelma",
              "DZ-33":"Illizi",
              "DZ-18":"Jijel",
              "DZ-40":"Khenchela",
              "DZ-03":"Laghouat",
              "DZ-29":"Mascara",
              "DZ-43":"Mila",
              "DZ-27":"Mostaganem",
              "DZ-28":"Msila",
              "DZ-26":"Médéa",
              "DZ-45":"Naama",
              "DZ-31":"Oran",
              "DZ-30":"Ouargla",
              "DZ-04":"Oum el Bouaghi",
              "DZ-48":"Relizane",
              "DZ-20":"Saïda",
              "DZ-22":"Sidi Bel Abbès",
              "DZ-21":"Skikda",
              "DZ-41":"Souk Ahras",
              "DZ-19":"Sétif",
              "DZ-11":"Tamanghasset",
              "DZ-14":"Tiaret",
              "DZ-37":"Tindouf",
              "DZ-42":"Tipaza",
              "DZ-38":"Tissemsilt",
              "DZ-15":"Tizi Ouzou",
              "DZ-13":"Tlemcen",
              "DZ-12":"Tébessa"}}

But when I do console.log(text) with what I get from jQuery.ajax, I get the following:

{"error":false,
 "provinces":{"DZ-01":"Adrar",
              "DZ-16":"Alger",
              "DZ-23":"Annaba",
              "DZ-44":"A�n Defla",
              "DZ-46":"A�n T�mouchent",
              "DZ-05":"Batna",
              "DZ-07":"Biskra",
              "DZ-09":"Blida",
              "DZ-34":"Bordj Bou Arr�ridj",
              "DZ-10":"Bouira",
              "DZ-35":"Boumerd�s",
              "DZ-08":"B�char",
              "DZ-06":"B�ja�a",
              "DZ-02":"Chlef",
              "DZ-25":"Constantine",
              "DZ-17":"Djelfa",
              "DZ-32":"El Bayadh",
              "DZ-39":"El Oued",
              "DZ-36":"El Tarf",
              "DZ-47":"Gharda�a",
              "DZ-24":"Guelma",
              "DZ-33":"Illizi",
              "DZ-18":"Jijel",
              "DZ-40":"Khenchela",
              "DZ-03":"Laghouat",
              "DZ-29":"Mascara",
              "DZ-43":"Mila",
              "DZ-27":"Mostaganem",
              "DZ-28":"Msila",
              "DZ-26":"M�d�a",
              "DZ-45":"Naama",
              "DZ-31":"Oran",
              "DZ-30":"Ouargla",
              "DZ-04":"Oum el Bouaghi",
              "DZ-48":"Relizane",
              "DZ-20":"Sa�da",
              "DZ-22":"Sidi Bel Abb�s",
              "DZ-21":"Skikda",
              "DZ-41":"Souk Ahras",
              "DZ-19":"S�tif",
              "DZ-11":"Tamanghasset",
              "DZ-14":"Tiaret",
              "DZ-37":"Tindouf",
              "DZ-42":"Tipaza",
              "DZ-38":"Tissemsilt",
              "DZ-15":"Tizi Ouzou",
              "DZ-13":"Tlemcen",
              "DZ-12":"T�bessa"}}

It seems to me that jQuery is doing something weird with the data.

+1  A: 

Can you use UTF-8, instead?

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

In PHP, you can encode JSON data as UTF-8:

/**
 * Applies a UTF-8 encoding conversion for text.
 */
function utf8_enc( $rows ) {
  $encoded = array();

  foreach( $rows as $row ) {
    $temp = array();

    foreach( $row as $name => $value ) {
      $temp[ $name ] = $value = mb_convert_encoding( $value, 'auto', 'UTF-8' );
    }

    array_push( $encoded, $temp );
  }

  return $encoded;
}

function db_json( $query ) {
  echo json_encode( utf8_enc( db_fetch_all( db_query( $query ) ) ) );
}

I was seeing some strange results using the ISO-8859-1 accented character set. I switched to UTF-8 and the encoding problems disappeared.

For what it's worth, I have coded getJSON as follows:

  $.getJSON( HOST + 'cat.dhtml', function( data ) {
    var h = '';
    var len = data.length;

    for( var i = 0; i < len; i++ ) {
      h += '<option value="' + data[i].id + '">' + data[i].name + '</option>';
      categories[ data[i].id ] = data[i];
    }

    $('#category').html(h);
  });
Dave Jarvis
I guess I can still try this out on the server-side with my servlet. Let me give it a shot! Thanks!
Vivin Paliath
@Dave I tried doing what you suggested, but it didn't work. I return the JSON from the servlet encoded as UTF-8, but this doesn't seem to work either. I still get the funky diamond characters.
Vivin Paliath
@Dave thanks for all your help - I figured it out. It was related to a question you linked to in the comments (about setting the headers explicitly). I assumed that setting the character encoding would be the same, but apparently not!
Vivin Paliath
A: 

The php function json_encode does not support ISO-8859-1 encoded data.

This article might help you with your problem: http://www.pabloviquez.com/2009/07/json-iso-8859-1-and-utf-8-%E2%80%93-part2/

spad3s
+1  A: 

RFC 4627 states that JSON text SHALL be encoded in Unicode, whatever that means, and json.org indicates that all characters be "unicode characters":

  • Encoding

    JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

    Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.

       00 00 00 xx  UTF-32BE
       00 xx 00 xx  UTF-16BE
       xx 00 00 00  UTF-32LE
       xx 00 xx 00  UTF-16LE
       xx xx xx xx  UTF-8
    

So if you're transferring JSON and saying that it's ISO-8859-1 then different JSON libraries may interpret the SHALL clause from the RFC that defines JSON in various ways, e.g. by encoding the replacement character or by sniffing the encoding. The best way if obviously to take this to whatever is outside your control and tell them to fix it :-)

Workarounds

One way to work around it is to create a servlet filter that removes all characters that are incompatible with both UTF-8 and ISO-8859-1 and replace them with JSON escapes:

In the following fragment, replace 'é' with '\u00E9' so that any offending ISO-8859-1 character is safely transported in the 7-bits that are identical:

Before: { "a" : "éte" }

After: { "a" : "\u00E9te" }

It's not as legible, but semantically speaking, it's the same, and any good JSON library should treat them identically.

mogsie
Thanks @mogsie. I didn't know about this RFC. I'll try to work around this somehow :)
Vivin Paliath
@vivin-paliath, I added a suggestion for a workaround.
mogsie
@mogsie, thanks! I'll remember that for future reference!
Vivin Paliath
A: 

I finally figured it out. It's pretty weird!

response.setCharacterEncoding(String) does not work (don't know if it's related to my setup or what). It looks like it sets the character encoding, but for some reason jQuery messes it all up. You have the explicitly set the headers like so:

response.setHeader("Content-Type", "application/json; charset=ISO-8859-1");

Thanks for all the help, everyone!

EDIT

I did some research and checked out the JavaDocs and saw this:

Containers must communicate the character encoding used for the servlet response's writer to the client if the protocol provides a way for doing so. In the case of HTTP, the character encoding is communicated as part of the Content-Type header for text media types. Note that the character encoding cannot be communicated via HTTP headers if the servlet does not specify a content type; however, it is still used to encode text written via the servlet response's writer.

So the above still works, but you can also (and probably should) do this:

response.setContentType("application/json");
response.setCharacterEncoding("ISO-8859-1"); 
Vivin Paliath
+1  A: 

It seems to me you receive a parsing error because the response data are wrong decoded and so contain some wrong characters.

You could try to insert in jQuery.ajax an additional parameter

dataFilter : function ( data, type ) {
    alert(data);
    return data;
}

If you will have wrong but different characters for all non-ASCII characters ('ï', 'é' and so on) you can try to replace the wrong encoded characters to the correct characters and return correct encoded data from the dataFilter.

Oleg
@Oleg - Thanks! I tried this also :) I still got the clobbered characters. However, I figured out the actual problem - encoding wasn't being set even thought I thought it was!
Vivin Paliath
I don't write programs in java, so i could help on the server side. But do you have all non-ASCII characters inside of `dataFilter` as the same one character like � or the characters are different (in `data` input parameter)?
Oleg
Yes, I had all non ascii characters as � inside `dataFilter`. The actual problem was that the headers were being set incorrectly. The characters were being encoded right, but the charset was still UTF8. I had to set the application type and then set the character encoding, for the charset to stick.
Vivin Paliath