views:

489

answers:

2

Hello everyone,

I am trying to get some data from the server via an AJAX call and then displaying the result using responseDiv.innerHTML. The data from the server comes partially encoded with Unicode elements, like: za\u010Dat test. By setting the innerHTML of the response div, this just displayed as is. That is, the Unicode is not converted to an actual representation in the browser.

The charset of the containing page is set to UTF-8. I have tried most other things, like converting the unicode representation to HTML entities, but that doesn't seem to work either.

I should also mention that the text coming from the server has HTML tags intermixed as well. The HTML tags are honored as they should be. For example, if the text from the server comes as <b>Bold this!</b>, the text is bolded.

Any help appreciated.

Vikram

A: 

Can you replace '\u010D' with '&#x010D;'?

AFAIK the HTML tags coming from the server should work if you are setting the innerHTML.

This works for me:

document.getElementById('info').innerHTML = "&#x010D; <b>Bold this</b>";

BTW - you can use something like Fiddler or Firebug to ensure you are getting what you expect from the server.

Update: use regular expressions to find and replace the unicode characters with HTML entities:

$.get('data.txt', function(data) {
    data = data.replace(/\\u([0-9A-F])([0-9A-F])([0-9A-F])([0-9A-F])/g, '&#x$1$2$3$4;');
    document.getElementById('info').innerHTML = data;
});
russau
Hmm.. The text is stored in the database and I could presumably run a Unicode to HTMLEntity converter to do so before sending to the server. Is there a way to convert only the Unicode parts on the client side instead?
Vikram Goyal
updated the answer with an alternative. force javascript to treat the response as a string.
russau
Good suggestion. The only problem is that the data contains HTML with strings within it. Thus, the data contains markup, like: <b>Hello</b> Link is: <a href="http://www.google.com">here</a> and things like that. This causes the Javascript to throw up unterminated string literal error on doing an eval.
Vikram Goyal
added code to escape the single quotes
russau
Nearly! I was still getting unterminated string literal error and figured out that this was because the data contains not only HTML, but it contains HTML across several lines. So when the eval is done, it breaks as the string is not terminated and carried on to newer lines. Any way to fix that?
Vikram Goyal
I have tried: rString = rString.replace(/\n/g, " ") and rString = rString.replace(/\r\n/g, ""). Although they do replace the newlines, it still complains about unterminated string literal. It doesn't complain about this error, if I manually go into the database, and make everything fit on one line. :)
Vikram Goyal
new approach: forget the eval - too many edge cases you have to deal with. use regular expressions back references (http://www.webreference.com/js/column5/values.html) to replace the unicode chars with HTML entities.
russau
A: 

Just convert the unicode literals to characters directly:

'H\u0065\u006Clo, world!'.replace(/\u([0-9a-fA-F]{4})/, function() {
    return String.fromCharCode(parseInt(arguments[1],16));
});

Jonathan Feinberg