tags:

views:

931

answers:

3

In my Java application I have been passed in a string that looks like this:

"\u00a5123"

When printing that string into the console, I get the same string as the output (as expected).

However, I want to print that out by having the unicode converted into the actual yen symbol (\u00a5 -> yen symbol) - how would I go about doing this?

i.e. so it looks like this: "[yen symbol]123"

+2  A: 

I wrote a little program:

public static void main(String[] args) {
    System.out.println("\u00a5123");
}

It's output:

¥123

i.e. it output exactly what you stated in your post. I am not sure there is not something else going on. What version of Java are you using?

edit:

In response to your clarification, there are a couple of different techniques. The most straightforward is to look for a "\u" followed by 4 hex-code characters, extract that piece and replace with a unicode version with the hexcode (using the Character class). This of course assumes the string will not have a \u in front of it.

I am not aware of any particular system to parse the String as though it was an encoded Java String.

aperkins
You are correct in saying that printing the string directly will give you the correct output. However, someone has passed me a string that is essentially escaped. So let's assume your main method still exists but you called a method called foo as follows: foo("\\u00a5123"); <-- note the escaping of the string - so essentially the parameter I get inside the foo method is the string I am dealing with
digiarnie
Backslash escaping is something that only the Java compiler needs to deal with, not the JVM or the API. So it's not surprising to find that there isn't an easy way to parse such strings at runtime.
Todd Owen
@Todd agreed - about the only other thing I have been able to think of is attempting to use the compiler in some way - but that just sounds like trouble to me.
aperkins
+1  A: 

You're probably going to have to write a parse for these, unless you can find one in a third party library. There is nothing in the JDK to parse these for you, I know because I fairly recently had an idea to use these kind of escapes as a way to smuggle unicode through a Latin-1-only database. (I ended up doing something else btw)

I will tell you that java.util.Properties escapes and unescapes Unicode characters in this manner when reading and writing files (since the files have to be ASCII). The methods it uses for this are private, so you can't call them, but you could use the JDK source code to inspire your solution.

Licky Lindsay
A bit convoluted, but you could probably emit the string as a value to an in-memory properties file and then read it using the `Properties` class.
McDowell
A: 

As has been mentioned before, these strings will have to be parsed to get the desired result.

  1. Tokenize the string by using \u as separator. For example: \u63A5\u53D7 => { "63A5", "53D7" }

  2. Process these strings as follows:

    String hex = "63A5";
    int intValue = Integer.parseInt(hex, 16);
    System.out.println((char)intValue);
    
Abhinav Maheshwari