tags:

views:

127

answers:

3

Hi guys, The characters I am getting from the URL, for example www.mydomain.com/?name=john , were fine, as longs as they were not in Russian.

If they were are in Russian, I was getting '����'.

So I added $name= iconv("cp1251","utf-8" ,$name); and now it works fine for Russian and English characters, but screws up other languages. :)))

For example 'Jānis' ( Latvian ) that worked fine before iconv, now turns into 'jДЃnis'.

Any idea if there's some universal encoder that would work with both the Cyrillic languages and not screw up other languages?

+2  A: 

Why don't you just use UTF-8 with all files and processes?

TiuTalk
aaa... how do I do it? * embarrassed *
Is this a question or an answer?
Gumbo
it's an answer. obvious enough to be phrased in the form of question.
Col. Shrapnel
+1  A: 

Seems like the issue is the file encoding, you should always use UTF-8 no BOM as the prefered encoding for your .php files, code editors such as Intype let you easily specify this (UTF-8 Plain).

alt text

Also, add the following code to your files before any output:

header('Content-Type: text/html; charset=utf-8');

You should also read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky.

Alix Axel
+2  A: 

Actually this runs down to the problem of how the URL is encoded. If you're clicking a link on a given page the browser will use the page's encoding to sent the request but if you enter the URL directly into the address-bar of your browser the behavior is somehow undefined as there is no standardized way on the encoding to use (Firefox provides an about:config switch to use UTF-8 encoded URLs).

Besides using some encoding detection there is no way to know the encoding used with the URL in the given request.

EDIT:

Just to backup what I said above, I wrote a small test script that shows the default behavior of the five major browsers (running Mac OS X in my case - Windows Vista via Parallels in case of the IE):

$p = $_GET['p'];
for ($i = 0; $i < strlen($p); $i++) {
    // this displays the binary data received via the URL in hex format
    echo dechex(ord($p[$i])) . ' ';
}

Calling http://path/to/script.php?p=äöü leads to

  • Safari (4.0.5): c3 a4 c3 b6 c3 bc
  • Firefox (3.6.3): c3 a4 c3 b6 c3 bc
  • Google Chrome (5.0.375.38): c3 a4 c3 b6 c3 bc
  • Opera (10.10): e4 f6 fc
  • Internet Explorer (8.0.6001.18904): e4 f6 fc

So obviously the first three use UTF-8 encoded URLs while Opera and IE use ISO-8859-1 or some of its variants. Conclusion: you cannot be sure what's the encoding of textual data sent via an URL.

Stefan Gehrig
I highly doubt this is related to browser encoding, my guess would be the content-type header is not using the utf-8 charset.
Alix Axel
That depends on how exactly he's calling his PHP script. If he enters `http://www.mydomain.com/?name=Делать` into the browser's address bar, it will be a browser encoding problem as he will not know which encoding is used to send the request. If he clicks a link with the mentioned `href` for example, I'm with you - then it's the page's encoding that seems to be the problem.
Stefan Gehrig
@Stefan: Tried it and in FF 3.6 it outputs gibberish either way, if I set up Content-Type correctly both scenarios work.
Alix Axel
Thanks for the answers guys. It appears fine if it is a clicked link, versus just typed in URL. Thanks again!