ansaurus

Question

Answer 1

+3 A:

JSON is a serialization format which can include UNICODE characters. The byte representation of this unicode string is usually sent over the wire, normally through the HTTP protocol which uses HTTP headers to specify the encoding to the client which is UTF-8.

Darin Dimitrov 2010-05-03 16:21:46

Answer 2

+3 A:

From the RFC:

3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.

           00 00 00 xx  UTF-32BE
           00 xx 00 xx  UTF-16BE
           xx 00 00 00  UTF-32LE
           xx 00 xx 00  UTF-16LE
           xx xx xx xx  UTF-8

cobbal 2010-05-03 16:25:31

Answer 3

+2 A:

Reasonable question. JSON is oriented towards serialization/communication but, at its core, it is a text format. Hence is correctly specified in terms of characters (units of text), not bytes.

The convertion of that text to/from bytes, that is, the charset encoding, is outside JSON itself. Though, considering that it must support any Unicode text, a Unicode charset encoding should be used (UTF-8, normally).

leonbloy 2010-05-03 16:26:48

Answer 4

A:

You're correct that everything must translate into bytes, and usually that usually occurs through a UTF (Unicode Transformation Format). The JSON RFC explains in section 3 how to tell what UTF is being used.

Matthew Flaschen 2010-05-03 16:27:30

ansaurus

tags:

views:

answers:

JSON specifies "any UNICODE character"?

related questions