views:

63

answers:

3

This is a part 2 question from This Question.

So I'm trying out the :encode functionality but having no luck at all.

use Encode;
use utf8;

# Should print: iso-8859-15
print "Latin-9 Encoding: ".find_encoding("latin9")->name."\n"; 

my $encUK = encode("iso-8859-15", "UK €");
print "Encoded UK: ".$encUK."\n";

Results:

Encoded UK: UK €

Shouldn't the results be encoded? what am I doing wrong here?

EDIT:

Added the suggested:

use utf8;

and now I get this:

Encoded UK: UK �

pulling hair out now :/

+1  A: 

I think perhaps you are not encoding the character properly in your script. What does your editor think is its encoding?

e.g. I just tried this, to circumvent that entirely:

use Encode;

# Should print: iso-8859-15
print "Latin-9 Encoding: ".find_encoding("latin9")->name."\n";

my $encUK = encode("iso-8859-15", "\xA3");
print "Encoded UK: ", $encUK, "\n";

output:

 
Latin-9 Encoding: iso-8859-15  
Encoded UK: £  
Ether
where did you find this? \xA3 isn't this the encoding?
Phill Pafford
yes that works, but I guess my second question is "Do I need to map all the characters for the UK"? I thought the encode() I could pass the character £ and it would encode it to \xA3, is this not correct?
Phill Pafford
@Phil: I googled for "hex code uk pound symbol" to find a chart of hex codes, as I couldn't find the right encoding for my installation of vim to get the symbol properly inserted into my test script. It should *already* be represented by \xA3 internally, *provided your editor encoded the character properly*, which may not be the case.
Ether
I use VI and it shows up just find using just the character, but shouldn't the encode() convert it?
Phill Pafford
@Phil: yes, probably. You could use a hex dumper to confirm what you're getting out on the other end.
Ether
Ether, the question was about the Euro symbol, not the Pound symbol - your example code, while working in principle, is slightly wrong.
daxim
+2  A: 

Don't pull your hair. You did everything right, are finished and are already getting the intended data; the output is confusing you because you probably look at it from a terminal that is not set up for Latin-9, but for a different encoding, presumably UTF-8.

> perl -e'use utf8; use Encode; print encode "Latin-9", "Euro €"'
Euro �

> perl -e'use utf8; use Encode; print encode "Latin-9", "Euro €"' | hex
0000  45 75 72 6f 20 a4                                 Euro .

Codepoint A4 is indeed the Euro symbol in Latin-9.

daxim
Thanks using the link I see that the Euro is \x{20AC} but a ? is still showing up but I have tried a couple of other symbols and they work just fine. \x{00A3}
Phill Pafford
You are misreading the table. `U+20AC` is the codepoint of the Euro character in Unicode, but you said you want the encoding `Latin-9`. Use the row and column headers. `A` and `4` gives `A4`. This table is intended just for sanity checks, if you write such `\x` escape codes everywhere, it defeats the purpose of the `Encode` module. ⁓ You see the replacement character because you use the wrong terminal setting. Switch its character encoding to `Latin-9`, too. ⁓ Best explanation for character encoding in Perl is http://p3rl.org/UNI, read all of it.
daxim
A: 

"use utf8;" is, since Perl 5.8, only used to tell Perl that your source file is encoded in UTF-8.

So does the encoding of your source really matches what you're telling to Perl?

With 'vim' must use this option to write the file in UTF-8:

:set fenc=utf8

And to get back UTF-8 when you load the file, you must define fileencodings in your .vimrc:

set fileencodings=ucs-bom,utf-8,latin9
dolmen