There's a third argument to encode
, which controls the checking it does. The default is to use a substitution character, but you can set it to FB_CROAK to get an error message.
views:
107answers:
2
+1
A:
Snake Plissken
2010-06-03 06:03:38
thanks for suggestion. I tried and got the following error message:"\x{2019}" does not map to iso-8859-1 at /usr/lib/perl5/5.8.5/i386-linux-thread-multi/Encode.pm line 158.
ppant
2010-06-03 06:59:01
+1
A:
The fundamental problem is that the characters represented by ’
, “
, and ”
do not exist in ISO-8859-1. You'll have to decide what it is that you want to do with them.
Some possibilities:
Use cp1252, Microsoft's "extended" version of ISO-8859-1, instead of the real thing. It does include those characters.
Re-encode the entities outside the ISO-8859-1 range (plus &
), before converting from utf-8 to ISO-8859-1:
my $toEncode = do { no warnings 'utf8'; "&\x{0100}-\x{10FFFF}" };
$string = HTML::Entities::encode_entities($string, $toEncode);
(The no warnings
bit is needed because U+10FFFF hasn't actually been assigned yet.)
There are other possibilities. It really depends on what you're trying to accomplish.
cjm
2010-06-03 07:15:28
cp1252 is working fine and replaces the specified characters properly but after some testing I found that the same fix doesn't work on different machine/server. I know this is not directly related to question but do you have any idea why this is not converting on some machines or what might be the other factors (apache settings etc)?
ppant
2010-06-04 08:04:39
@ppant, you'll have to be more specific about what you mean by "doesn't work". Are you serving it as `charset=windows-1252`?
cjm
2010-06-04 08:28:08
if i encode the string given in problem like my $EnStr = encode("cp1252",$str);then it doesn't show proper char on some machines. I was asking if "Is there is any other factor which might affects the encoding?"
ppant
2010-06-04 08:54:07
When you deliver the HTML to the browser, what charset are you claiming it is? If you call it ISO-8859-1, not all browsers will handle it properly, since cp1252 is not ISO-8859-1. The official name for cp1252 is windows-1252, and you should identify it as that.
cjm
2010-06-04 09:40:37
I am aiming ISO-8859-1 in browsers.. I understand your point but my issue is not with browser now even if i make a test program by using the Encode module then also it fails to give correct results in some machines by using Windows-1252. I am trying to find out any other factor which is machine dependent or module dependent
ppant
2010-06-04 11:00:05
I don't know. You should probably ask another question. Include the versions of Perl, HTML::Entities, and Encode, along with the input string and output string and the exact code you're using.
cjm
2010-06-04 18:29:17