tags:

views:

2432

answers:

8

The JSON format natively doesn't support binary data. The binary data has to be escaped so that it can be places into a string element (i.e. zero or more Unicode chars in double quotes using backslash escapes) in JSON.

An obvious method to escape binary data is to use Base64. However, Base64 has a high processing overhead. Also it expands 3 bytes into 4 characters with leads to an increased data size by around 33%.

One use case for this is the v0.8 draft of the CDMI cloud storage API specification. You create data objects via a REST-Webservice using JSON, e.g.

PUT /MyContainer/BinaryObject HTTP/1.1
Host: cloud.example.com
Accept: application/vnd.org.snia.cdmi.dataobject+json
Content-Type: application/vnd.org.snia.cdmi.dataobject+json
X-CDMI-Specification-Version: 1.0
{"
    "mimetype" : "application/octet-stream”,
    "metadata" : [ ],
    "value" :   "TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
    IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
    dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
    dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
    ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=",
}

Are there better ways and standard methods to encode binary data into JSON strings?

+1  A: 

yEnc might work for you:

http://en.wikipedia.org/wiki/Yenc

richardtallent
Great quote i wasn't aware of yEnc existence
wezzy
+1  A: 

Can you compress the binary data first, before encoding?

Alex Reynolds
A: 

Since you're looking for the ability to shoehorn binary data into a strictly text-based and very limited format, I think Base64's overhead is minimal compared to the convenience you're expecting to maintain with JSON. If processing power and throughput is a concern, then you'd probably need to reconsider your file formats.

jsoverson
A: 

The Google Gears team ran into the lack-of-binary-data-types problem and has attempted to address it:

Blob API

JavaScript has a built-in data type for text strings, but nothing for binary data. The Blob object attempts to address this limitation.

Maybe you can weave that in somehow.

a paid nerd
+9  A: 

There are 94 Unicode characters which can be represented as one byte according to the JSON spec (if your JSON is transmitted as UTF-8). With that in mind, I think the best you can do space-wise is base85 which represents four bytes as five characters. However, this is only a 7% improvement over base64, it's more expensive to compute, and implementations are less common than for base64 so it's probably not a win.

You could also simply map every input byte to the corresponding character in U+0000-U+00FF, then do the minimum encoding required by the JSON standard to pass those characters; the advantage here is that the required decoding is nil beyond builtin functions, but the space efficiency is bad -- a 105% expansion (if all input bytes are equally likely) vs. 25% for base85 or 33% for base64.

Final verdict: base64 wins, in my opinion, on the grounds that it's common, easy, and not bad enough to warrant replacement.

hobbs
+2  A: 

While it is true that base64 has ~33% expansion rate, it is not necessarily true that processing overhead is significantly more than this: it really depends on JSON library/toolkit you are using. Encoding and decoding are simple straight-forward operations, and they can even be optimized wrt character encoding (as JSON only supports UTF-8/16/32) -- base64 characters are always single-byte for JSON String entries. For example on Java platform there are libraries that can do the job rather efficiently, so that overhead is mostly due to expanded size.

I agree with two earlier answers:

  • base64 is simple, commonly used standard, so it is unlikely to find something better specifically to use with JSON (base-85 is used by postscript etc; but benefits are at best marginal when you think about it)
  • compression before encoding (and after decoding) may make lots of sense, depending on data you use
StaxMan
+1  A: 

http://en.wikipedia.org/wiki/BSON may help you

iman
A: 

There's a method which is compatible with any restricted alphabet, and in fact any output syntax.

http://stackoverflow.com/questions/3330838/coroutine-demo-source http://encode.ru/threads/1118-Rangecoding-with-restricted-output-alphabet

Shelwien