views:

115

answers:

2

Hi guys,

I'm not able to correctly show this char on my web pages. I'm using UTF-8 charset for this page, have I to use ISO-8859-2? I'm getting this a string with this char from a db and on it, it's saved as ě. My Browser show only html tag.

It's the only char (at this moment) that I can't show on my webpage. I've take a look to the http://www.czech.cz and they use UTF-8.

any suggests?

take care! Andrea

+1  A: 

Are you seeing the ě in the browser, or when you view source? If you're seeing it in the browser, then it's probably being double-encoded somewhere -- whatever outputs it to the page is probably detecting it as unencoded HTML and is trying to protect you from some kind of HTML-injection. You'll want to make it not do that. But you have an even deeper problem. If your page is served up in UTF-8, and your data is in UTF-8, there isn't any reason to turn it into an HTML entity in the first place. You should be passing through the UTF-8 data. You do not need to switch to a different character encoding.

rmeador
It's my browser that it's not able to translate the '' code. There is also a problem. I've an admin page to upload on db the string, and before update on db I call $text=htmlentities($text,ENT_QUOTES);. For all other languages all is correct, but not for this char.....
Andrea Girardi
Use `htmlspecialchars` **not** `htmlentities`. `htmlentities` tries to encode all non-ASCII characters, which is needless and will corrupt them if you don't tell it the right character set. It defaults to nasty old ISO-8859-1.
bobince
+1  A: 

First of all, yes, you really should be using UTF-8. But that doesn't mean the data you have is already UTF-8 encoded.

Secondly, it sounds like that character is HTML encoded in the database already. This is a problem, because it seems that whatever page is displaying this character also tries to HTML-encode the content as well. Here's an example of what I'm talking about.

Data from user: ě
Data HTML encoded (via htmlentities()) prior to going into DB: ě
Data stored in DB: ě
Data retrieved from DB: ě
Data HTML encoded before being printed to the page: ě
Data as seen in the browser: ě

Do you see that? The character becomes double encoded, so that on the 2nd encoding step the ampersand character is converted into an entity itself.

This is the problem with HTML-encoding data before storing it in the database. That should only be done prior to displaying the content, not prior to storage.

Peter Bailey
You are the man! It's exactly the problem..... So, I've to remove the htmlentities() prior to dong into DB, is it?
Andrea Girardi
Andrea Girardi
You should use `htmlspecialchars` when outputting text into the HTML page, and `htmlentities` never. Don't HTML-escape content going into the database.
bobince
Ok, I've found the problem. The char is coded on DB as ě How can I prevent this?
Andrea Girardi
I've remove the htmlentities and changed the charset to ISO-8859-1 and it works fine.
Andrea Girardi
It works only because your content is still HTML-encoded in the database. If you ever repair that data (or remove the step that encodes it) then ISO-8859-1 won't be sufficient.
Peter Bailey
I've suffered from this issue myself. I'm not sure yet about who's to blame about it but it happens when you need to store a character that is not allowed in the database character set. The only reasonable fix is to change the DB charset to another one with a wider character set, such as UTF-8, or reject the input data when in contains invalid chars.
Álvaro G. Vicario