Hi. I am using $encoding = 'utf-8';
in gettext and in my html code i have set <meta charset="utf-8">
. I have also set utf-8 in my .po files, but I still get � when I write æøå! What can be wrong?
views:
47answers:
2
+4
A:
Let's see how the values you mention are at the byte level.
I copied the æøå
from your question and �
from your title. The reason for �
is that I had to use a Windows console application to fetch the title of your question and its codepage was Windows 1252 (copying from the browser gave me Unicode Character 'REPLACEMENT CHARACTER' (U+FFFD)
).
In a script encoded in UTF-8, this gives:
<?php
$s = 'æøå';
$s2 = '�';
echo "s iso-8859-1 ", @reset(unpack("H*", mb_convert_encoding($s, "ISO-8859-1", "UTF-8"))), "\n";
echo "s2 win-1252 ", @reset(unpack("H*", mb_convert_encoding($s, "WINDOWS-1252", "UTF-8"))), "\n";
s iso-8859-1 e6f8e5 s2 win-1252 e6f8e5
So the byte representation matches. The problem here is that when you write æøå
either:
- You're writing it in ISO-8859-1, instead of UTF-8. Check your text editor.
- The value is being converted from UTF-8 to ISO-8859-1 (unlikely)
Artefacto
2010-08-22 01:50:11
notepad plus -> encoding -> encode in utf-8 thanks :)
ganjan
2010-08-22 07:46:05
Artefacto: Why do you see a `�` in the title? This page is declared to be encoded with UTF-8, you need to change the encoding manually to ISO 8859-1 to have this output.
Gumbo
2010-08-27 15:38:57
@Gumbo You're right; it didn't occur to me, but it works (at least in Opera 10.61/Windows). By the way, I find it odd that changing the encoding manually to ISO-8859-1 makes the browser interpret it as win-1252, but I guess it's just a compatibility quirky (I think the HTML 5 spec specifically says to interpret pages that declare being ISO-8859-1 as win-1252).
Artefacto
2010-08-27 15:44:00
@Gumbo My mistake, those are all valid iso-8859-1 characters.
Artefacto
2010-08-27 15:48:06
A:
You need to set this
bind_textdomain_codeset($domain, "UTF-8");
Otherwise you will get the � character
Julio Montoya
2010-08-27 15:32:51