ansaurus

Question

How To HTML Escape Curly Quotes in a Java String

Answer 1

+2 A:

The compiler problem is because you've got '/u8221' instead of '\u8221' - a forward slash instead of a backslash.

I'm not entirely convinced that using the entities will help, but you can try... I suppose it depends on how broken the downstream code is.

EDIT: Doh, I hadn't spotted that your Unicode values were in decimal. Yes, they need to be in hex :) I'll leave this answer here as it explains why the compiler was complaining - '\u8221' is a perfectly character escape sequence, just not the one you wanted :)

Jon Skeet 2009-11-13 21:07:29

That would have been another case of “Why do Chinese character show up in the middle of my English text?” ;-)

Arthur Reutenauer 2009-11-13 21:16:55

Hah! Well, that was definitely my first problem. Kicking myself for that one -- thanks, Jon.

Sean McMains 2009-11-13 21:21:35

Answer 2

+4 A:

Unicode literals are in hexadecimal:

case '\u201c':
    sb.append("&#8220;");
    break;
....

And, as mentioned in the other answers, you've got a / instead of a \ in one of your literals.

Adam Goode 2009-11-13 21:08:11

And this was my second issue. Appreciate it, Adam.

Sean McMains 2009-11-13 21:22:31

Answer 3

+4 A:

You can use the literal character (i.e., '‘'), but your build process needs to specify the correct source encoding during compilation. The javac command option is -encoding. (The attribute on Ant's javac task is the same.) This should match whatever encoding used by your IDE when saving the files.

If your IDE is using UTF-8, for example, but the build machine is using its platform default encoding of US-ASCII, the special characters will be decoded as ?. Since multiple cases now have the same label, you get the original error message.

erickson 2009-11-13 21:14:26

This is very good to know. I think I'm going to keep going with the escaped version, however, so that we don't have to fight with encoding issues across various machines when we check out our code. Thank you for the info!

Sean McMains 2009-11-13 21:23:34

Answer 4

A:

The default encoding varies from platform to platform - Windows uses its own ISO-Latin-1 dialect (at least those I've worked on). Linux frequently use UTF-8 (which is most likely your problem) and Mac uses MacRoman. You can circumvent most of your problems by keeping to plain 7-bit ASCII, and using \u for anything above that if you need it in your source code.

Personally I would keep anything "national" outside the Java source, and use the Localization features to look up translated strings for simple keys and they are placed in your Java code.

Thorbjørn Ravn Andersen 2009-11-13 21:52:23

Answer 5

A:

A better approach would be to use Apache Commons Lang http://commons.apache.org/lang/api/org/apache/commons/lang/StringEscapeUtils.html.

Kennet 2009-11-14 10:03:59

I actually like this library a lot, but it wasn't escaping exactly what we needed, so we had to do our custom version. (Old versions of IE gave us trouble with apostrophes encoded their way, as I recall.)

Sean McMains 2009-11-17 14:40:18

ansaurus

tags:

views:

answers:

How To HTML Escape Curly Quotes in a Java String

related questions