views:

186

answers:

3

Hi there i am trying to load an XML from a remote host useing node.js. The problem is that german "umlaute" like "ä" are broken. Like in the browser this usualy is a simple encoding problem. But since the XML on the remote host is encoded in iso-8859-2" i had no success getting the letters back to work.

The functionality is very simple. I simply use the default http client integrated in node.js to connect to a remote host with a simple get request.

Some env facts: The remote system uses "iso-8859-2" encoding. The encoding is currectly set in the response header. The chars are unrecoverable broken in the data (chunk) received by response.onData(chunk)

Node.js is running on Version 0.2 on da default debian server.

The code is based on the default httpClient like descriped in the node.js documentation.

I tried the following: response.defaultAsciiEncoding true/false response.encoding = UFT-8/ascii

used a utf8 encoder/decoder to encode/decode the chunk after this failed i tried to encode/decode the whole response body

I am not very familiar with useing buffers and i guess the problem must be in that direction. Or node.js(or the httpClient) simply cant handle other enc types by default witch is my second guess. In this case i need to write my own http client useing the net lib i think. I just want to make sure i don't walk into the wrong direction :)

THX for helping or even gl&hf while guessing ;)!

A: 

Have you tried setting the encoding parameter in the XML declaration?

<?xml version="1.0" encoding="iso-8859-2" ?>
<xml>
  <!-- whatever -->
</xml>

XML files default to UTF-8 unless you explicitly declare their encoding.

Tomalak
The remote source is dynamic and not under my control. But yes the xml version and encoding is set. I uploaded a sampleResponse to my server. I may add a node.js script as well to reproduce the error. The sample location is http://node.geht-ab.net/original.html
agebrock
@age: Not sure what this is supposed to be? It is served as text/html with no encoding parameter.
Tomalak
Yes sorry i forgot so simulate the header correctly. http://node.geht-ab.net/original.php I just added the Content-type header. Just the doctype of the xml is set to iso-8859-1 the response itself has no encoding information. Here is the original:Connection:keep-aliveContent-Length:181706Content-Type:text/xmlDate:Sun, 12 Sep 2010 02:43:40 GMTServer:Apache
agebrock
A: 

It seems to me that node.js can't work with encoding other than utf-8. Maybe using something like node-iconv should work.

svick
The problem here is there is, i can find no point / event to access the raw data. They look touched to me at response.onData(chunk). I'll may check the node.js libs to see whats going on. But in case i use the net.socket on port 80. The bindings you found couldt be usefull.
agebrock
A: 

I had a quick poke around the node.js source and it seems like svick is right: node.js doesn't support the iso encoding. You can, however, get at the response as a binary stream and then either return it to the browser with your own encoding or use node-iconv (again as svick suggested).

Here's a little example: http://gist.github.com/576884

bxjx
response.setEncoding("binary"); Did the trick i can't believe i didn't try that. Somehow i only tried using ascii here. For a quick prototype i used php.js utf8_encode. Works perfectly. Thanks for the answers and the link to the iconv bindings.
agebrock