tags:

views:

513

answers:

2

I have an ArrayList of objects being dumped to a YAML string and have been comparing the performance of JYaml and SnakeYaml in handling this.

    ArrayList<HashMap> testList = new ArrayList<HashMap>();
    HashMap<String, String> testMap1 = new HashMap<String, String>();
    HashMap<String, String> testMap2 = new HashMap<String, String>();

    testMap1.put("1_1", "One");
    testMap1.put("1_2", "Two");
    testMap1.put("1_3", "Three");

    testMap2.put("2_1", "One");
    testMap2.put("2_2", "Two");
    testMap2.put("2_3", "Three");

    testList.add(testMap1);
    testList.add(testMap2);

    System.out.println(jYaml.dump(testList));
    System.out.println(snakeYaml.dump(testList));


The output from JYaml includes the serialised object's class name whereas the output from SnakeYaml does not:

JYaml output:

- !java.util.HashMap
  1_1: One
  1_3: Three
  1_2: Two
- !java.util.HashMap
  2_1: One
  2_2: Two
  2_3: Three

SnakeYaml output:

- {'1_1': One, '1_3': Three, '1_2': Two}
- {'2_1': One, '2_2': Two, '2_3': Three}


I prefer the more 'clean' class name-less output of SnakeYaml as this would be more suitable for a language-neutral environment.

I prefer the speed of JYaml. Serialisation/deserialisation times increase linearly with the amount of data being processed, as opposed to exponentially with SnakeYaml.

I'd like to coerce JYaml into giving me class name-less output but am quite lost as to how this can be achieved.

A: 

How do you measure the speed ? What do you mean 'amount of data' ? Is it a size of a YAML document or an amount of documents ?

JYaml output is incorrect. According to the specification underscores in numbers are ignored and 1_1 = 11 (at least for YAML 1.1). Because it is in fact a String and not an Integer the representation shall be:

  • '1_1': One

or canonically

  • !!str "1_1": !!str "One"

Otherwise when the document is parsed it will create Map<Integer, String> instead of Map<String, String>

JYaml has many open issues and does not implement complete YAML 1.1

JYaml may indeed be faster but it is due to the simplified parsing and emitting.

+1  A: 

Check the SnakeYAML latest source. It is now possible (same as in JYaml) to ignore implicit typing and always parse scalars as Strings. This is a few times faster. Look here and here to see how to use the new feature.

(With the RegularExpressions off serialisation/deserialisation times increase linearly with the amount of data being processed.)