views:

322

answers:

3

Before I begin, I would like to highlight the structure of what I am working with.

  1. There is a text file from which a specific text is taken. The file is encoded in utf-8
  2. Perl takes the file and prints it into a page. Everything is displayed as it should be. Perl is set to use utf-8
  3. The web page Perl generates has the following header <meta content="text/html;charset=utf-8" http-equiv="content-type"/>. Hence it is utf-8
  4. After the first load, everything is loaded dynamically via jQuery/AJAX. By flipping through pages, it is possible to load the exact same text, only this time it is loaded by JavaScript. The Request has following header Content-Type: application/x-www-form-urlencoded; charset=UTF-8
  5. The Perl handler which processes the AJAX Request on the Backend delivers contents in utf-8
  6. The AJAX Handler calls up a function in our custom Framework. Before the Framework prints out the text, it is displayed correctly as "üöä". After being sent to the AJAX Handler, it reads "x{c3}\x{b6}\x{c3}\x{a4}\x{c3}\x{bc}" which is the utf-8 representation of "üöä".
  7. After the AJAX Handler delivers its package to the client as JSON, the webpage prints the following: "öäü".
  8. The JS and Perl files themselves are saved in utf-8 (default setting in Eclipse)

These are the symptoms. I tried everything Google told me and I still have the problem. Does anyone have a clue what it could be? If you need any specific code snippet, tell me so and I'll try to paste it.

Edit 1

The Response Header from the AJAX Handler

Date: Mon, 09 Nov 2009 11:40:27 GMT
Server: Apache/2.2.10 (Linux/SUSE)
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset="utf-8"

200 OK

Answer

With the help of you folks and this page, I was able to track down the problem. Seems like the problem was not the encoding by itself, but rather Perl encoding my variable $text twice as utf-8 (according to the site). The solution was as simple as adding Encode::decode_utf8().

I was searching in the completely wrong place to begin with. I thank you all who helped me search in the right place :)

#spreads some upvote love#

+2  A: 

This isn't an answer so much as a suggestion for debugging. The first thing that springs to mind is to try sending HTML entities like &#1234; instead of utf-8 codes. To make Perl send these there is surely a module or you can just do

 my $text =~ s/(.)/"&#" . ord ($1) . ";"/ge;

The thing which it seems to me the most likely cause of this problem is that the JavaScript receiving end and is not able to understand the encoded UTF-8 from Perl.

Kinopiko
returns the following:
Mike
Those numbers are all ASCII. Nothing which needs to be encoded as UTF-8. Also you have forgotten the #.
Kinopiko
+6  A: 
bobince
Added solution into original Question. I flagged this as the accepted answer since it pointed me to the right solution
Mike
A: 

You say that Perl is encoding the data twice. That sounds very suspicious. Are you mucking with the utf8 flags yourself? How are you reading and writing the data? You say "Perl is set to use utf8", but what do you mean specifically? What did you change in your code that made it work out?

brian d foy