ansaurus

Question

JSP displaying single and double quotes as symbol

Answer 1

+1 A:

These are probably non-standard characters in your database...perhaps directional quotes instead of the straight up-and-down ones?

A straight-forward way to handle this, since you can't change the data in the database, would just be to use a replace or regex to swap out "bad" characters with ones that will display correctly.

Beska 2009-09-04 13:12:54

this is not the exact answer but it lead to my solution.

Berek Bryan 2009-09-04 17:51:20

Answer 2

A:

Ned Batchelder 2009-09-04 13:15:01

Answer 3

+6 A:

That's character U+0094, which is a largely-unused control code. You will usually get characters in this range by accident if you use ISO-8859-1 to decode bytes that are actually in Windows codepage 1252 (Western European). They are similar encodings and often confused with each other, but the symbols in the range 0x80-0x9F are different. Windows cp1252 uses some of those for things like smart quotes, which is what you probably expected here: a double-close-quote (”, U+201D RIGHT DOUBLE QUOTATION MARK).

Such is the confusion that most web browsers, when told that a web page is ISO-8859-1, will actually use cp1252 instead and would render the quote. So this probably isn't a markup-side issue.

What you probably have is a database that contains CP1252, and a data access layer that is converting the bytes out of it to a String using ISO-8859-1 — perhaps because this is the server's default encoding. Ideally you'd want to configure the database to store Unicode strings natively, but if you can't do that you'll need to a way to configure your database connector to use the CP1252 encoding instead of ISO-8859-1. How you do this depends on what you're connecting with and to; you might have to set a property, or include a parameter in a connection string.

If you can't do that with your data layer, about the only thing left is to manually go over all the string values you get from the database and transcode them back to what they should be, by encoding with a ISO-8859-1 Encoding, followed by decoding with CP1252. This would be a real pain to do, but as a last resort would work.

[Side-issue: close-double-quote is the incorrect character for denoting inches. ″ (Unicode U+2033 DOUBLE PRIME) would be best, but if you're limited to legacy encodings, a straight " double-quote will do.]

bobince 2009-09-04 13:49:38

I think your diagnosis is slightly off - looking at the result, he's got the right Unicode data in his string, but that gets encoded to Cp1252 but decoded using UTF-8 as per the metadata - see my answer for more.

McDowell 2009-09-04 14:31:23

That was my immediate reaction but I don't think it actually is what's happening. If you include an invalid sequence such as a lone 0x94 byte in a UTF-8 page, most browsers will give you a replacement character, such as ‘?’ or ‘�’, not the actual control character ‘’ as posted in the question. Of course it's always a bit tricky with questions like these as these kinds of characters can easily get mangled again before being pasted here...

bobince 2009-09-04 14:40:30

Ah, yes, you are correct; I recant.

McDowell 2009-09-04 14:58:19

Your answer does address a very common case, which might be useful to keep for stumbling googlers unrecanted. Derecanted? Decanted?

bobince 2009-09-04 15:11:41

great write up very helpful was not able to get the CP1252 to work.

Berek Bryan 2009-09-04 17:52:30

hmm... try "Windows_1252", I think that may be its name under Java.

bobince 2009-09-04 21:50:27

Answer 4

A:

0094 as pointed out, is not the straight double quote. Not that there is a problem with using a different quote, but 0094 is not available in most fonts - only some east asian fonts seem to have this character. In fact, it is the CANCEL character which falls in the control character category, and not the initial quote or final quote character categories.

It is also a relatively unused character, although it is present in the Latin-1 supplement Unicode block. So you could impose a filter (input or output) to handle this character.

The input filter would simply impose a whitelist of characters that your application will store, and obviously support in display.

The output filter would basically replace Unicode characters that give you problems, with better variants.

Vineet Reynolds 2009-09-04 13:58:24

ansaurus

tags:

views:

answers:

JSP displaying single and double quotes as symbol

related questions