views:

247

answers:

3

I have encountered a weird situation while updating/upgrading some legacy code.

I have a variable which contains HTML. Before I can output it, it has to be filled with lots of data. In essence, I have the following:

for my $line (@lines) {
    $output = loadstuff($line, $output); 
}

Inside of loadstuff(), there is the following

sub loadstuff {
    my ($line, $output) = @_;
    # here the process is simplified for better understanding.
    my $stuff = getOtherStuff($line);
    my $result = $output.$stuff;
    return $result;
}

This function builds a page which consists of different areas. All area is loaded up independently, that's why there is a for-loop.

Trouble starts right about here. When I load the page from ground up (click on a link, Perl executes and delivers HTML), everything is loaded fine. Whenever I load a second page via AJAX for comparison, that HTML has broken encoding.

I tracked down the problem to this line my $result = $output.$stuff. Before the concatenation, $output and $stuff are fine. But afterward, the encoding in $result is messed up.

Does somebody have a clue why concatenation messes up my encoding? While we are on the subject, why does it only happen when the call is done via AJAX?

Edit 1

The Perl and the AJAX call both execute the very same functions for building up a page. So, whenever I fix it for AJAX, it is broken for freshly reloaded pages. It really seems to happen only if AJAX starts the call.

The only difference in this particular case is that the current values for the page are compared with an older one (it is a backup/restore function). From here, everything is the same. The encoding in the variables (as far as I can tell) are ok. I even tried the Encode functions only on the values loaded from AJAX, but to no avail. The files themselves seem to be utf8 according to "Kate".

Besides that, I have a another function with the same behavior which uses the EXACT same functions, values and files. When the call is started from Perl/Apache, the encoding is ok. Via AJAX, again, it is messed up.

I have been examinating the AJAX Request (jQuery) and could not find anything odd. The encoding seems to be utf8 too.

+3  A: 

Perl has a “utf8” flag for every scalar value, which may be “on” or “off”. “On” state of the flag tells perl to treat the value as a string of Unicode characters.

If you take a string with utf8 flag off and concatenate it with a string that has utf8 flag on, perl converts the first one to Unicode. This is the usual source of problems.

You need to either convert both variables to bytes with Encode::encode() or to perl's internal format with Encode::decode() before concatenation.

See perldoc Encode.

eugene y
it definately has something to do with this. When I try and change the encoding, the output reacts accordingly. i even managed to get the full load with messed up encoding and the AJAX Version correctly encoded. Seems like the trouble is the way Perl starts his request.
Mike
+2  A: 

Expanding on the previous answer, here's a little more information that I found useful when I started messing with character encodings in Perl.

This is an excellent introduction to Unicode in perl: http://perldoc.perl.org/perluniintro.html. The section "Perl's Unicode Model" is particularly relevant to the issue you're seeing.

A good rule to use in Perl is to decode data to Perl characters on it's way in and encode it into bytes on it's way out. You can do this explicitly using Encode::encode and Encode::decode. If you're reading from/writing to a file handle you can specify an encoding on the filehandle by using binmode and setting layer: perldoc -f binmode

You can tell which of the strings in your example has been decoded into Perl characters using Encode::is_utf8:

use Encode qw( is_utf8 );
print is_utf8($stuff) ? 'characters' : 'bytes';
tjmw
A: 

A colleague of mine found the answer to this problem. It really had something to do with the fact that AJAX started the call.

The file structure is as follows:

1 Handler, accessed by Apache
1 Handler, accessed by Apache but who only contains AJAX responders. We call it the AJAX-Handler
1 package, which contains functions relevant for the entire software, who access yet other packages from our own Framework

Inside of the AJAX-Handler, we print the result as such

sub handler {
    my $r = shift; 
    # processing output   
    $r->print($output);
    return Apache2::Const::OK;
}

Now, when I replace $r->print($output); by print($output);, the problem disappears! I know that this is not the recommended way to print stuff in mod_perl, but this seems to work.

Still, any ideas how to do this the proper way are welcome.

Mike