ansaurus

Question

Answer 1

+1 A:

How do you mean you have problems reloading it? Do you output it to a HTML page? In that case, you might have set the wrong charset. As for using entities, check this out: htmlentitites

phant0m 2010-07-05 08:40:45

no thats not the problem.

helle 2010-07-05 08:48:20

Why the downvote? It's a perfectly reasonable answer given the question.

Álvaro G. Vicario 2010-07-05 08:55:09

then what is the problem?As Álvaro already said, $_SESSION does not care about encodings.Concerning your edit: If stripslashes removes the slash, it means that the string \00e4 was interpreted literally. You seem to be using the octal notation. See Alvaro's post on how to do it with Hex.

phant0m 2010-07-05 08:57:19

downvote, because it does not solves the problem (the question has nothing to do with html output), but had an upvote. if somebody else has the same problem, he would think this is a good answer, and it's not! sorry, don't take it personally ;-)

helle 2010-07-05 09:27:24

for your original question, it was a perfectly reasonable answer. Excuse me that I do not possess the power foresight to discover your actual problem :)

phant0m 2010-07-05 09:36:32

... what ever ...

helle 2010-07-05 09:51:03

Answer 2

+2 A:

With htmlentities():

<?php

echo htmlentities("\xE4");

?>

However, it's worth noting that:

Sessions do not care about character encoding.
HTML entities are not required in HTML Unicode documents (except for chars with a special meaning in HTML such as < and >).

So this won't fix your problem, it will just hide it ;-)

Update

I had overlooked the reference to \00e4 in the original question. The ä character corresponds to the U+00E4 Unicode code point. However, PHP does not support Unicode code points. If you need to type it in your PHP code and your keyboard does not have such symbol, you can save the document as UTF-8 and then provide the UTF-8 bytes (c3 a4) with the double quote syntax:

<?php
// \[0-7]{1,3} or \x[0-9A-Fa-f]{1,2}
echo "\xc3\xa4";
?>

Still, this has no relation to sessions or HTML. I can't understand what your exact problem is.

Second update

So serialize() cannot handle associative arrays and json_decode() cannot be fed with json_encode()'s output...

<?php

$associative_array = array(
    'foo' => 'ä',
    'bar' => 33,
    'gee' => array(10, 20, 30),
);

var_dump($associative_array);
echo PHP_EOL;
var_dump(serialize($associative_array));
echo PHP_EOL;
var_dump(unserialize(serialize($associative_array)));
echo PHP_EOL;

var_dump(json_encode($associative_array));
echo PHP_EOL;
var_dump(json_decode(json_encode($associative_array)));
echo PHP_EOL;

?>

...

array(3) {
  ["foo"]=>
  string(2) "ä"
  ["bar"]=>
  int(33)
  ["gee"]=>
  array(3) {
    [0]=>
    int(10)
    [1]=>
    int(20)
    [2]=>
    int(30)
  }
}

string(83) "a:3:{s:3:"foo";s:2:"ä";s:3:"bar";i:33;s:3:"gee";a:3:{i:0;i:10;i:1;i:20;i:2;i:30;}}"

array(3) {
  ["foo"]=>
  string(2) "ä"
  ["bar"]=>
  int(33)
  ["gee"]=>
  array(3) {
    [0]=>
    int(10)
    [1]=>
    int(20)
    [2]=>
    int(30)
  }
}

string(42) "{"foo":"\u00e4","bar":33,"gee":[10,20,30]}"

object(stdClass)#1 (3) {
  ["foo"]=>
  string(2) "ä"
  ["bar"]=>
  int(33)
  ["gee"]=>
  array(3) {
    [0]=>
    int(10)
    [1]=>
    int(20)
    [2]=>
    int(30)
  }
}

It appears to me that you are adding several layers of complexity to a simple script because you are making assumptions about how some PHP functions work instead of checking the manual or testing yourself. At this point, the information provided hardly resembles the original question and we still haven't seen a single line of code.

My advice so far is that you try to stop debugging your app as a whole, divide it into smaller pieces and use var_dump() to find out what each of these parts actually generate. Don't assume things: test stuff yourself. Also, take into account that PHP doesn't Unicode natively as others languages do. Every single task that involves double-byte string handling must be carefully implemented with the appropriate multi-byte functions, which often require to hard-code the character encoding.

Álvaro G. Vicario 2010-07-05 08:42:42

so where do u get an entity of \00e4 ? i don't want to parse to much, if i parse \00e4 to \xE4 i can also parse it to ä ...

helle 2010-07-05 08:51:18

Actually, `\00e4` means nothing in PHP. It's not a valid octal number because it contains an `e`. How do you get such string in the first place?

Álvaro G. Vicario 2010-07-05 09:21:56

it comes from json_encode functioni thaught it is unicode. so what is this? any ideas? thanks

helle 2010-07-05 09:23:59

It's json's way of denoting Unicode characters. In that case, you should be able to use json_decode.

phant0m 2010-07-05 09:28:40

If you use json_encode() you should be getting `\u00e4`, not `\00e4`. Unlike PHP, JavaScript does support Unicode code points. If you want the original string back, just use json_decode()!

Álvaro G. Vicario 2010-07-05 09:29:49

json_decode will work for decoding a string, but doesn't work in for my case, because i json_encode an associative array, to be able to store it to the session (fyi serialize didn't work the way i needed it in this special case!). i don't want to walk through every value, because the array is huge and nested.

helle 2010-07-05 09:37:18

Answer 3

A:

I'm not sure about this one helps you but take a look at Wordpress' sanitize_title function where you can find some huge character tables.

fabrik 2010-07-05 08:43:50

i am not using wordpress

helle 2010-07-05 08:45:12

You don't need to use WordPress. Use some code from linked source if it helps.

fabrik 2010-07-05 08:56:48

Answer 4

A:

As you can see in the discussions, and answeres, it is a problem, which php can't handle native (or until now nobody here knows)

i suggest using this very havy function ... i mean, this is my solution so far, which i do not like, very much.

function parse_umlaut($string){

        $string = str_replace('u00c4', 'Ä', $string);
        $string = str_replace('u00e4', 'ä', $string);
        $string = str_replace('u00d6', 'Ö', $string);
        $string = str_replace('u00f6', 'ö', $string);
        $string = str_replace('u00dc', 'Ü', $string);
        $string = str_replace('u00fc', 'ü', $string);
        $string = str_replace('u00df', 'ß', $string);

        return $string;
 }

helle 2010-07-05 09:47:47

why downvote this?

helle 2010-07-05 10:07:43

I thought it was a humorous answer... There are more than 100,000 Unicode code points--are you planning to replace them all one by one or just add new lines as your code crashes with new user input? Not to mention that it has a typo (it omits the backslash) and it fixes the wrong problem.

Álvaro G. Vicario 2010-07-05 10:20:48

... what ever ...

helle 2010-07-05 11:37:45

why can't you just post the code that provides you with which you get these umlaut strings? If we could look at that, maybe we could help you fix your problem.

phant0m 2010-07-06 12:54:42

ansaurus

tags:

views:

answers:

php encoding problem with unicode

Update

Second update

related questions