views:

62

answers:

2

I'm parsing an input stream coming from Facebook. I'm using something like

BufferedReader in =
    new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));

And then in.readLine to actually read from the stream.

The stream seems to have Unicode characters already encoded in ASCII, so I see things like \u00e4 (with \u actually being two discrete ASCII characters). Right now, I'm fishing for "\u" and decoding the subsequent two hex bytes, turn them into a char and replace the string with them, which is obviously the worst way to do it.

I'm sure there's a cool way to use a native function to decode the special characters as the stream is being read (I was hoping it could be done on the InputStreamReader layer). But how?

+1  A: 

If you see '\u00e4' with the '\' and the 'u' being separate, then the '0', '0', 'e' and '4' probably make up the 4 hex digits of a 2 byte (16 bit) Unicode character. The notation is based on C99; the alternative is '\U00XXYYZZ' where there are 8 hex digits representing a 32-bit UTF-32 character (but, because Unicode is a 21-bit code set, the first 2 of the 8 digits are always 0, and the next is often (usually) 0 too).

However, that doesn't answer your question about what's the right Android way to read the data, and you are right that there probably is one.

Jonathan Leffler
Yeah, it's essentially 6 bytes (well, physically 12 bytes, considering that it's inside a String, so each character is 2 bytes).And my approach works fine - I read it as a 16-bit value and use it as a char. But since I'm doing this in Java and replacing the string as I go, there is tons of JVM and memory management overhead. Doing this natively while parsing would be infinitely faster.
EboMike
+1  A: 

The data format is JSON, which I didn't mention (and which Thanatos already assumed). Using Android's JSON parser will automatically decode the characters properly. Parsing JSON yourself is obviously a dumb idea on several levels.

EboMike