ansaurus

Question

Answer 1

+1 A:

I think perhaps you are not encoding the character properly in your script. What does your editor think is its encoding?

e.g. I just tried this, to circumvent that entirely:

use Encode;

# Should print: iso-8859-15
print "Latin-9 Encoding: ".find_encoding("latin9")->name."\n";

my $encUK = encode("iso-8859-15", "\xA3");
print "Encoded UK: ", $encUK, "\n";

output:

 
Latin-9 Encoding: iso-8859-15  
Encoded UK: £

Ether 2010-06-15 17:30:05

where did you find this? \xA3 isn't this the encoding?

Phill Pafford 2010-06-15 17:31:44

yes that works, but I guess my second question is "Do I need to map all the characters for the UK"? I thought the encode() I could pass the character £ and it would encode it to \xA3, is this not correct?

Phill Pafford 2010-06-15 17:34:48

@Phil: I googled for "hex code uk pound symbol" to find a chart of hex codes, as I couldn't find the right encoding for my installation of vim to get the symbol properly inserted into my test script. It should *already* be represented by \xA3 internally, *provided your editor encoded the character properly*, which may not be the case.

Ether 2010-06-15 17:47:56

I use VI and it shows up just find using just the character, but shouldn't the encode() convert it?

Phill Pafford 2010-06-15 17:56:02

@Phil: yes, probably. You could use a hex dumper to confirm what you're getting out on the other end.

Ether 2010-06-15 18:15:32

Ether, the question was about the Euro symbol, not the Pound symbol - your example code, while working in principle, is slightly wrong.

daxim 2010-06-15 19:47:57

Answer 2

+2 A:

Don't pull your hair. You did everything right, are finished and are already getting the intended data; the output is confusing you because you probably look at it from a terminal that is not set up for Latin-9, but for a different encoding, presumably UTF-8.

> perl -e'use utf8; use Encode; print encode "Latin-9", "Euro €"'
Euro �

> perl -e'use utf8; use Encode; print encode "Latin-9", "Euro €"' | hex
0000  45 75 72 6f 20 a4                                 Euro .

Codepoint A4 is indeed the Euro symbol in Latin-9.

daxim 2010-06-15 19:54:49

Thanks using the link I see that the Euro is \x{20AC} but a ? is still showing up but I have tried a couple of other symbols and they work just fine. \x{00A3}

Phill Pafford 2010-06-15 20:16:59

You are misreading the table. `U+20AC` is the codepoint of the Euro character in Unicode, but you said you want the encoding `Latin-9`. Use the row and column headers. `A` and `4` gives `A4`. This table is intended just for sanity checks, if you write such `\x` escape codes everywhere, it defeats the purpose of the `Encode` module. ⁓ You see the replacement character because you use the wrong terminal setting. Switch its character encoding to `Latin-9`, too. ⁓ Best explanation for character encoding in Perl is http://p3rl.org/UNI, read all of it.

daxim 2010-06-15 22:15:49

Answer 3

A:

"use utf8;" is, since Perl 5.8, only used to tell Perl that your source file is encoded in UTF-8.

So does the encoding of your source really matches what you're telling to Perl?

With 'vim' must use this option to write the file in UTF-8:

:set fenc=utf8

And to get back UTF-8 when you load the file, you must define fileencodings in your .vimrc:

set fileencodings=ucs-bom,utf-8,latin9

dolmen 2010-06-17 16:44:44

ansaurus

tags:

views:

answers:

Perl Encode - UK characters

related questions