What are the differences between a table of type set, and a table of type ordered_set? I'm interested in the differences in read/write performance, what the ordering is based on, the effects across distributed nodes, and such.
as far as the ordering goes from the source:
add_element(E, [H|Es]) when E > H -> [H|add_element(E, Es)];
add_element(E, [H|_]=Set) when E < H -> [E|Set];
add_element(_E, [_H|_]=Set) -> Set; %E == H
add_element(E, []) ->[E].
So the ordering looks like a straight < or > comparison on the element.
Other than the ordering the it's exactly the same as the set. So I'd hazard a guess that for elements of lower "value" look ups would be faster on average than the set. But other than that I'm not sure.
Since Erlang is process agnostic and doesn't allow variable modification the effects across distributed nodes should be identical to local nodes.
Caveat:
I haven't run any benchmarking on the two types so this is speculation on my part regarding performance.
The ordering is based on the primary key, which means ordered_set tables are much faster at doing match/select iteration using complex primary keys. For example, if your record looks like {{Key, Val1}, Val2}
, you can match or select on Key
to very quickly get Val1
and Val2
for every occurance of Key
. Other than that, I'm not aware of a significant difference in read/write speed.
Fragmenting ordered_set tables is also possible, though it means the iteration will be partially ordered, but not fully ordered. Iterating over a single fragment is ordered, but the order from fragment to fragment is undefined.