views:

337

answers:

3

I just came across something like this:

String sample = "somejunk+%3cfoobar%3e+morestuff";

Printed out, sample looks like this:

somejunk+<foobar>+morestuff

How does that work? U+003c and U+003e are the Unicode codes for the less than and greater than signs, respectively, which seems like more than a coincidence, but I've never heard of Java automatically doing something like this. I figured it'd be an easy thing to pop into Google, but it turns out Google doesn't like the percent sign.

+1  A: 

You can do something like this,

 String sample = "somejunk+%3cfoobar%3e+morestuff";
 String result = URLDecoder.decode(sample.replaceAll("\\+", "%2B"), "UTF8");
ZZ Coder
Turns out that's close, it was actually being used as a Wicket ExternalLink in my case. (http://wicket.apache.org/docs/1.4/org/apache/wicket/markup/html/link/ExternalLink.html)
Lord Torgamus
+1  A: 

That string is probably URL encoded You'd decode that in java using the URLDecoder

String res = java.net.URLDecoder.decode(sample, "UTF8");
nos
+1  A: 

Java does support Unicode escapes in char and String literals, but not URL encoding.

The Unicode escapes use '\uXXXX', where XXXX is the Unicode point in hexadecimal.

Curious tidbit: The grammar allows 'u' to occur multiple times, so that '\uuuuuuuu0041' is a valid Unicode escape (for 'A').

erickson
+1 for the curious tidbit.
Lord Torgamus