tags:

views:

95

answers:

3

Hi,i am not able to write a script to print all the latin -1 characters one by one.Can anybody help me in solving the problem?

I am using the below code but it is not giving me expected result.

foreach $char(0..255)  {      
   $hexval   = sprintf("%x",$char);  
   $charval = sprintf("%c",%hexval);  
   print "$charval";
}

output should be like :-

0065 - e  
0066 - f  
...  
...   
007F - character at the step

For all the codepoints after 007F,it is not giving me expected results.

Please help me out with this.

+3  A: 
foreach (0..255) {
$hexval = sprintf("%x",$_);
$charval = sprintf("%c",$_);
print "$_ => $hexval -> $charval\n"; 
}
Konerak
A: 

use strict would have given you a good clue as to the cause. On line 3 of your example you have

$charval = sprintf("%c",%hexval);  

However you don't set a value for %hexval, you probably ment $hexval. That is the second bug, you want to format the original value $char not the formatted hex value.

$charval = sprintf("%c", $char);  

This makes your second line unnessisary and the code can be simplified to

use strict;
for my $char (0..255) {
    printf "%c\n", $char;
}
Ven'Tatsu
+2  A: 

Your question title says you want "CP1252" but then in the body of your question you say you want "Latin-1". CP1252 and Latin-1 are not the same thing. CP1252 is a Microsoft encoding based on Latin-1 but with some of the characters that Microsoft deemed not useful replaced with other characters.

For example in CP1252, the byte 0x93 is a left double quote (“) but in Latin-1 it's a non-printable control code.

Perl's internal encoding is (almost but not quite) UTF-8. You could take a CP1252 byte and convert it into Perl's UTF-8 character string format like this:

use Encode qw(decode);

my $char = decode("CP1252", "\x80");

Character 0x80 in CP1252 is the Euro symbol. In Unicode the Euro symbol is U+20AC. So now $char will be set to "\x{20AC}".

Your next problem is that you want to "print out" the characters. That could mean many things. The problem is that you need to convert from Perl's internal character representation to whatever encoding your output device expects.

For example my Linux terminal window is happy to display UTF-8 so I would do the following to print out the Euro character:

binmode(STDOUT, ':utf8');

print $char, "\n";

That is unlikely to work at a Windows command prompt though.

If you are generating HTML output then you should write out UTF-8 and make sure you have an appropriate header to declare the encoding. That will work with pretty much any browser released in the last 10-15 years.

Grant McLean