tags:

views:

116

answers:

2

I am getting following encoded html as a json response and has no idea how to decode it to normal html string, which is an achor tag by the way.

x3ca hrefx3dx22http:\/\/wordnetweb.princeton.edu\/perl\/webwn?sx3dstrandx22x3ehttp:\/\/wordnetweb.princeton.edu\/perl\/webwn?sx3dstrandx3c\/ax3e

I have tried java.net.UrlDecoder.decode without anyluck.

A: 

That's not an encoding I've seen before, but it looks like xYZ (where Y and Z are hex digits [0-9a-f]) means "the character whose ascii code is 0xYZ". I'm not sure how the letter x itself would be encoded, so I would recommend trying to find out. But then you can just do a find and replace on the regex x([0-9a-f]{2}), by getting the integer represented by the two hex numbers, and then casting it to a char (or something similar to that).

Then also, it looks like slashes (and other characters? See if you can find out...) always have a backslash in front of them, so do another find-and-replace for that.

MatrixFrog
You should also try to figure out how unicode characters above `ff` would be represented, and be sure to modify your approach accordingly.
MatrixFrog
it works! thanks.
Waqas
A: 

Hello Waqas! The term you search for are "UTF8 Code Units". These Code units are basically a backslash, followed by a "x" and a hex ascii code. I wrote a little converter method for you:

public static String convertUTF8Units(String input) {
    String part = "", output = input;
    for(int i=0;i<input.length()-4;i++) {
        part = input.substring(i, i+4);
        if(part.startsWith("\\x")) {
            byte[] rawByte = new byte[1];
            rawByte[0] = (byte) (Integer.parseInt(part.substring(2), 16) & 0x000000FF);
            String raw = new String(rawByte);
            output = output.replace(part, raw);
        }
    }

    return output;
}

I know, its a bit frowzy, but it works :)

Keenora Fluffball
thanks Keenora, but I already did it using regular expression
Waqas