What are the biggest pros and cons of Apache Thrift vs Google's Protocol Buffers?
Protocol Buffers seems to have a more compact representation, but that's only an impression I get from reading the Thrift whitepaper. In their own words:
We decided against some extreme storage optimizations (i.e. packing small integers into ASCII or using a 7-bit continuation format) for the sake of simplicity and clarity in the code. These alterations can easily be made if and when we encounter a performance-critical use case that demands them.
Also, it may just be my impression, but Protocol Buffers seems to have some thicker abstractions around struct versioning. Thrift does have some versioning support, but it takes a bit of effort to make it happen.
They both offer many of the same features; however, there are some differences:
- Thrift supports 'exceptions'
- Protocol Buffers have much better documentation/examples
- Thrift has a builtin
Map
andSet
type - Protocol Buffers allow "extensions" - you can extend an external proto to add extra fields, while still allowing external code to operate on the values. There is no way to do this in Thrift
- I find Protocol Buffers much easier to read
Basically, they are fairly equivalent (with Protocol Buffers slightly more efficient from what I have read).
- Protobuf serialized objects are about 30% smaller then Thrift.
- Most actions you may want to do with protobuf objects (create, serialize, deserialize) are much slower than thrift.
- Thrift has richer data structures (Map, Set)
- Protobuf API looks cleaner, though the generated classes are all packed as an inner classes which is not so nice.
- Thrift enums are not real Java Enums, i.e. they are just ints. Protobuf has real java enums.
For a closer look at the differences, check out the source code diffs at this open source project.
One obvious thing not yet mentioned is that can be both a pro or con (and is same for both) is that they are binary protocols. This allows for more compact representation and possibly more performance (pros), but with reduced readability (or rather, debuggability), a con.
Also, both have bit less tool support than standard formats like xml (and maybe even json).
(EDIT) Here's an Interesting comparison that tackles both size & performance differences, and includes numbers for some other formats (xml, json) as well.
Another important difference are the languages supported by default.
- protobuf: Java, C++, Python
- Thrift: Java, C++, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, OCaml
Both could be extended for other platforms, but these at the languages bindings available out-of-the-box.
RPC is another key difference. Thrift generates code to implement RPC clients and servers wheres Protocol Buffers seems mostly designed as a data-interchange format alone.
I was able to get better performance with a text based protocol as compared to protobuff on python. However, no type checking or other fancy utf8 conversion, etc... which protobuff offers.
So, if serialization/deserialization is all you need, then you can probably use something else.
http://dhruvbird.blogspot.com/2010/05/protocol-buffers-vs-http.html