views:

812

answers:

4

When pulling data from a MySQL database onto a web page, all ellipsis's(...) in the data are displayed with a � in firefox or a square box in IE7.

Has anyone ever encountered this problem before?

Thanks.

update 1: I just changed the original ellipsis '…' with '...' (three dots) and now it works? Any idea what this could be?

+3  A: 

You are probably pulling UTF-8 data from your database to a website with ISO (or other) encoding.

What is the encoding in your database and what is the header encoding for your html?

Ólafur Waage
I guess the Database is encoded in windows-1252 (ISO-8859-1 doesn't even have …) and the website is in UTF-8.
d0k
+2  A: 

This is really an encoding issue but instead of trying to get around that, I suggest you use the more correct approach of encoding ellipses as `…' HTML entity.

Alternately, you could test it by choosing View > Character Encoding > in Firefox or similar in IE. Most likely you'll end up having to add:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

or

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
aleemb
The entity is actually …
mipadi
Personally, I don't consider HTML entities "more correct" at all. They're an incomplete workaround to the encoding problem, HTML-specific and just plain ugly.
Michael Borgwardt
Agreed to Michael. It should be better to fix the encoding problem here, instead of trying to work around it with entities. It will rear its ugly head sooner or later anyway.
Joey
Maybe not for storage but for the presentation I don't see why it's not a good solution. In PHP for example, calling htmlentities() would fix this issue and add another layer of safety.
aleemb
Also, the problem is very HTML specific. If the output is to a desktop client, then it's simply not possible to add a content type meta tag (and in fact, the problem may be absent altogether).
aleemb
+1  A: 

It depends on the character set of your db

you could always replace them with &hellip;

Colin Pickard
+1  A: 

I just changed the original ellipsis '…' with '...' (three dots) and now it works?

That's probably want you want to do anyway. The character U+2026 HORIZONTAL ELLIPSIS is a ‘compatibility’ character, included to aid round-tripping between Unicode and old character sets such as Windows cp1252 (Western European code page) where the ellipsis exists as a character in its own right.

(The idea is that on modern systems, you just use three dots; if the font wants to make the spacing different in an ellipsis — most don't — they can provide an auto-ligature for when three dots are typed.)

all ellipsis's

ellipses :-)

in the data are displayed with a � in firefox or a square box in IE7.

Probably all your other non-ASCII characters are similarly affected; you may see similar results when ‘smart quotes’ or díäçritical marks are used.

Most likely your database has characters stored as Windows cp1252 bytes, but the final web page you're spitting them out into is UTF-8 (either by default or due to it deliberately set that way).

You can check this by going to the browser's View->Encoding menu and picking out ‘Western European’ (1252) instead of ‘UTF-8’. Whilst you could fix this by changing the encoding of the web page being produced to cp1252, it would be better to change the contents of the database so that everything was UTF-8; then all Unicode characters would be usable in your application.

Quite how you do this would depend on what language/platform you're using.

bobince