czech char 'ě' on php page script

views:

115

answers:

+1 Q:

czech char 'ě' on php page script

Hi guys,

I'm not able to correctly show this char on my web pages. I'm using UTF-8 charset for this page, have I to use ISO-8859-2? I'm getting this a string with this char from a db and on it, it's saved as ě. My Browser show only html tag.

It's the only char (at this moment) that I can't show on my webpage. I've take a look to the http://www.czech.cz and they use UTF-8.

any suggests?

take care! Andrea

+1 A:

Are you seeing the ě in the browser, or when you view source? If you're seeing it in the browser, then it's probably being double-encoded somewhere -- whatever outputs it to the page is probably detecting it as unencoded HTML and is trying to protect you from some kind of HTML-injection. You'll want to make it not do that. But you have an even deeper problem. If your page is served up in UTF-8, and your data is in UTF-8, there isn't any reason to turn it into an HTML entity in the first place. You should be passing through the UTF-8 data. You do not need to switch to a different character encoding.

rmeador 2010-04-26 15:26:38

It's my browser that it's not able to translate the '' code. There is also a problem. I've an admin page to upload on db the string, and before update on db I call $text=htmlentities($text,ENT_QUOTES);. For all other languages all is correct, but not for this char.....

Andrea Girardi 2010-04-26 15:38:45

Use `htmlspecialchars` **not** `htmlentities`. `htmlentities` tries to encode all non-ASCII characters, which is needless and will corrupt them if you don't tell it the right character set. It defaults to nasty old ISO-8859-1.

bobince 2010-04-26 15:49:15

+1 A:

First of all, yes, you really should be using UTF-8. But that doesn't mean the data you have is already UTF-8 encoded.

Secondly, it sounds like that character is HTML encoded in the database already. This is a problem, because it seems that whatever page is displaying this character also tries to HTML-encode the content as well. Here's an example of what I'm talking about.

Data from user: ě
Data HTML encoded (via htmlentities()) prior to going into DB: ě
Data stored in DB: ě
Data retrieved from DB: ě
Data HTML encoded before being printed to the page: &#283;
Data as seen in the browser: ě

Do you see that? The character becomes double encoded, so that on the 2nd encoding step the ampersand character is converted into an entity itself.

This is the problem with HTML-encoding data before storing it in the database. That should only be done prior to displaying the content, not prior to storage.

Peter Bailey 2010-04-26 15:36:33

You are the man! It's exactly the problem..... So, I've to remove the htmlentities() prior to dong into DB, is it?

Andrea Girardi 2010-04-26 15:44:28

Andrea Girardi 2010-04-26 15:46:43

You should use `htmlspecialchars` when outputting text into the HTML page, and `htmlentities` never. Don't HTML-escape content going into the database.

bobince 2010-04-26 15:50:34

Ok, I've found the problem. The char is coded on DB as ě How can I prevent this?

Andrea Girardi 2010-04-26 15:53:46

I've remove the htmlentities and changed the charset to ISO-8859-1 and it works fine.

Andrea Girardi 2010-04-26 16:03:49

It works only because your content is still HTML-encoded in the database. If you ever repair that data (or remove the step that encodes it) then ISO-8859-1 won't be sufficient.

Peter Bailey 2010-04-26 16:05:21

I've suffered from this issue myself. I'm not sure yet about who's to blame about it but it happens when you need to store a character that is not allowed in the database character set. The only reasonable fix is to change the DB charset to another one with a wider character set, such as UTF-8, or reject the input data when in contains invalid chars.

Álvaro G. Vicario 2010-04-26 16:35:54

ansaurus

tags:

views:

answers:

czech char 'ě' on php page script

related questions

ansaurus

tags:

views:

answers:

czech char '&#283;' on php page script

related questions

czech char 'ě' on php page script