views:

53

answers:

3

I have the following string "\u3048\u3075\u3057\u3093". I got the string from a web page as part of returned data in JSONP.

What is that? It looks like UTF8, but then should it look like "U+3048U+3075U+3057U+3093"?

What's the meaning of the backslashes (\)?

How can I convert it to a human-readable form?

I'm looking to a solution with Ruby, but any explanation of what's going on here is appreciated.

A: 

Unicode characters in JSON are escaped as backslash u followed by four hex digits. See the string production on json.org.

Any JSON parser will convert it to the correct representation for your platform (if it doesn't, then by definition it is not a JSON parser)

Pete Kirkham
+2  A: 

It is unicode, but not in utf8. It is in utf-16. You might ignore surrogate pairs and deem it as 4 digit hexadecimal code points of unicode code character.

With ruby 1.9

require 'json'

puts JSON.parse("[\"\u4e00\",\"\u4e8c\"]")

It prints: 一 二

OmniBus
+4  A: 

The U+3048 syntax is normally used to represent the Unicode code point of a character. Such code point is fixed and does not depend on the encoding (UTF-8, UTF-32...).

A JSON string is composed of Unicode characters except double quote, backslash and those in the U+0000 to U+001F range (control characters). Characters can be represented with a escape sequence starting with \u and followed by 4 hexadecimal digits that represent the Unicode code point of the character. This is the JavaScript syntax (JSON is a subset of it). In JavaScript, the backslash is used as escape char.

Álvaro G. Vicario