ansaurus

Question

Unable to encode to iso-8859-1 encoding for some chars using Perl Encode module

Answer 1

+1 A:

There's a third argument to encode, which controls the checking it does. The default is to use a substitution character, but you can set it to FB_CROAK to get an error message.

Snake Plissken 2010-06-03 06:03:38

thanks for suggestion. I tried and got the following error message:"\x{2019}" does not map to iso-8859-1 at /usr/lib/perl5/5.8.5/i386-linux-thread-multi/Encode.pm line 158.

ppant 2010-06-03 06:59:01

Answer 2

+1 A:

The fundamental problem is that the characters represented by ’, “, and ” do not exist in ISO-8859-1. You'll have to decide what it is that you want to do with them.

Some possibilities:

Use cp1252, Microsoft's "extended" version of ISO-8859-1, instead of the real thing. It does include those characters.

Re-encode the entities outside the ISO-8859-1 range (plus &), before converting from utf-8 to ISO-8859-1:

my $toEncode = do { no warnings 'utf8'; "&\x{0100}-\x{10FFFF}" };
$string = HTML::Entities::encode_entities($string, $toEncode);

(The no warnings bit is needed because U+10FFFF hasn't actually been assigned yet.)

There are other possibilities. It really depends on what you're trying to accomplish.

cjm 2010-06-03 07:15:28

cp1252 is working fine and replaces the specified characters properly but after some testing I found that the same fix doesn't work on different machine/server. I know this is not directly related to question but do you have any idea why this is not converting on some machines or what might be the other factors (apache settings etc)?

ppant 2010-06-04 08:04:39

@ppant, you'll have to be more specific about what you mean by "doesn't work". Are you serving it as `charset=windows-1252`?

cjm 2010-06-04 08:28:08

if i encode the string given in problem like my $EnStr = encode("cp1252",$str);then it doesn't show proper char on some machines. I was asking if "Is there is any other factor which might affects the encoding?"

ppant 2010-06-04 08:54:07

When you deliver the HTML to the browser, what charset are you claiming it is? If you call it ISO-8859-1, not all browsers will handle it properly, since cp1252 is not ISO-8859-1. The official name for cp1252 is windows-1252, and you should identify it as that.

cjm 2010-06-04 09:40:37

I am aiming ISO-8859-1 in browsers.. I understand your point but my issue is not with browser now even if i make a test program by using the Encode module then also it fails to give correct results in some machines by using Windows-1252. I am trying to find out any other factor which is machine dependent or module dependent

ppant 2010-06-04 11:00:05

I don't know. You should probably ask another question. Include the versions of Perl, HTML::Entities, and Encode, along with the input string and output string and the exact code you're using.

cjm 2010-06-04 18:29:17

ansaurus

tags:

views:

answers:

Unable to encode to iso-8859-1 encoding for some chars using Perl Encode module

related questions