ansaurus

Question

How is it that json serialization is so much faster than yaml serialization in python?

Answer 1

+2 A:

In general, it's not the complexity of the output that determines the speed of parsing, but the complexity of the accepted input. The JSON grammar is very concise. The YAML parsers are comparatively complex, leading to increased overheads.

JSON’s foremost design goal is simplicity and universality. Thus, JSON is trivial to generate and parse, at the cost of reduced human readability. It also uses a lowest common denominator information model, ensuring any JSON data can be easily processed by every modern programming environment.

In contrast, YAML’s foremost design goals are human readability and support for serializing arbitrary native data structures. Thus, YAML allows for extremely readable files, but is more complex to generate and parse. In addition, YAML ventures beyond the lowest common denominator data types, requiring more complex processing when crossing between different programming environments.

I'm not a YAML parser implementor, so I can't speak specifically to the orders of magnitude without some profiling data and a big corpus of examples. In any case, be sure to test over a large body of inputs before feeling confident in benchmark numbers.

Update Whoops, misread the question. :-( Serialization can still be blazingly fast despite the large input grammar; however, browsing the source, it looks like PyYAML's Python-level serialization constructs a representation graph whereas simplejson encodes builtin Python datatypes directly into text chunks.

cdleary 2010-03-16 04:03:56

(Just an example of how concise the JSON grammar is, you can't have an extraneous trailing comma at the end of an array. Overkill IMO.)

cdleary 2010-03-16 04:11:51

He's asking about encoding, not parsing.

Glenn Maynard 2010-03-16 06:59:02

Answer 2

+2 A:

A cursory look at python-yaml suggests its design is much more complex than cjson's:

>>> dir(cjson)
['DecodeError', 'EncodeError', 'Error', '__doc__', '__file__', '__name__', '__package__', 
'__version__', 'decode', 'encode']

>>> dir(yaml)
['AliasEvent', 'AliasToken', 'AnchorToken', 'BaseDumper', 'BaseLoader', 'BlockEndToken',
 'BlockEntryToken', 'BlockMappingStartToken', 'BlockSequenceStartToken', 'CBaseDumper',
'CBaseLoader', 'CDumper', 'CLoader', 'CSafeDumper', 'CSafeLoader', 'CollectionEndEvent', 
'CollectionNode', 'CollectionStartEvent', 'DirectiveToken', 'DocumentEndEvent', 'DocumentEndToken', 
'DocumentStartEvent', 'DocumentStartToken', 'Dumper', 'Event', 'FlowEntryToken', 
'FlowMappingEndToken', 'FlowMappingStartToken', 'FlowSequenceEndToken', 'FlowSequenceStartToken', 
'KeyToken', 'Loader', 'MappingEndEvent', 'MappingNode', 'MappingStartEvent', 'Mark', 
'MarkedYAMLError', 'Node', 'NodeEvent', 'SafeDumper', 'SafeLoader', 'ScalarEvent', 
'ScalarNode', 'ScalarToken', 'SequenceEndEvent', 'SequenceNode', 'SequenceStartEvent', 
'StreamEndEvent', 'StreamEndToken', 'StreamStartEvent', 'StreamStartToken', 'TagToken', 
'Token', 'ValueToken', 'YAMLError', 'YAMLObject', 'YAMLObjectMetaclass', '__builtins__', 
'__doc__', '__file__', '__name__', '__package__', '__path__', '__version__', '__with_libyaml__', 
'add_constructor', 'add_implicit_resolver', 'add_multi_constructor', 'add_multi_representer', 
'add_path_resolver', 'add_representer', 'compose', 'compose_all', 'composer', 'constructor', 
'cyaml', 'dump', 'dump_all', 'dumper', 'emit', 'emitter', 'error', 'events', 'load', 
'load_all', 'loader', 'nodes', 'parse', 'parser', 'reader', 'representer', 'resolver', 
'safe_dump', 'safe_dump_all', 'safe_load', 'safe_load_all', 'scan', 'scanner', 'serialize', 
'serialize_all', 'serializer', 'tokens']

More complex designs almost invariably mean slower designs, and this is far more complex than most people will ever need.

Glenn Maynard 2010-03-16 07:07:52

Answer 3

+1 A:

speaking about efficiency, i used YAML for a time and felt attracted by the simplicity that some name / value assignments take on in this language. however, in the process i tripped so and so often about one of YAML’s finesses, subtle variations in the grammar that allow you to write special cases in a more concise style and such. in the end, although YAML’s grammar is almost for certain formally consistent, it has left me with a certain feeling of ‘vagueness’. i then restricted myself to not touch existing, working YAML code and write everything new in a more roundabout, failsafe syntax—which made me abandon all of YAML. the upshot is that YAML tries to look like a W3C standard, and produces a small library of hard to read literature concerning its concepts and rules.

this, i feel, is by far more intellectual overhead than needed. look at SGML/XML: developped by IBM in the roaring 60s, standardized by the ISO, known (in a dumbed-down and modified form) as HTML to uncounted millions of people, documented and documented and documented again the world over. comes up li’l JSON and slains that dragon. how could JSON become so widely used in so short a time, with just one meager website (and a javascript luminary to back it)? it is in its simplicity, the sheer absence of doubt in its grammar, the ease of learning and using it.

XML and YAML ard hard for humans, and they are hard for computers. JSON is quite friendly and easy to both humans and computers.

flow 2010-03-16 18:53:32

ansaurus

tags:

views:

answers:

How is it that json serialization is so much faster than yaml serialization in python?

related questions