views:

47

answers:

2

Hi. I am using $encoding = 'utf-8'; in gettext and in my html code i have set <meta charset="utf-8">. I have also set utf-8 in my .po files, but I still get � when I write æøå! What can be wrong?

+4  A: 

Let's see how the values you mention are at the byte level.

I copied the æøå from your question and � from your title. The reason for � is that I had to use a Windows console application to fetch the title of your question and its codepage was Windows 1252 (copying from the browser gave me Unicode Character 'REPLACEMENT CHARACTER' (U+FFFD)).

In a script encoded in UTF-8, this gives:

<?php
$s = 'æøå';
$s2 = '�';

echo "s iso-8859-1 ", @reset(unpack("H*", mb_convert_encoding($s, "ISO-8859-1", "UTF-8"))), "\n";
echo "s2 win-1252  ", @reset(unpack("H*", mb_convert_encoding($s, "WINDOWS-1252", "UTF-8"))), "\n";
s iso-8859-1 e6f8e5
s2 win-1252  e6f8e5

So the byte representation matches. The problem here is that when you write æøå either:

  • You're writing it in ISO-8859-1, instead of UTF-8. Check your text editor.
  • The value is being converted from UTF-8 to ISO-8859-1 (unlikely)
Artefacto
notepad plus -> encoding -> encode in utf-8 thanks :)
ganjan
Artefacto: Why do you see a `�` in the title? This page is declared to be encoded with UTF-8, you need to change the encoding manually to ISO 8859-1 to have this output.
Gumbo
@Gumbo You're right; it didn't occur to me, but it works (at least in Opera 10.61/Windows). By the way, I find it odd that changing the encoding manually to ISO-8859-1 makes the browser interpret it as win-1252, but I guess it's just a compatibility quirky (I think the HTML 5 spec specifically says to interpret pages that declare being ISO-8859-1 as win-1252).
Artefacto
@Gumbo My mistake, those are all valid iso-8859-1 characters.
Artefacto
A: 

You need to set this

bind_textdomain_codeset($domain, "UTF-8");

Otherwise you will get the � character

Julio Montoya