views:

649

answers:

1

I want to understand why the protocol buffers solution for .NET developed by Marc Gravell is as fast as it is.

I can understand how the original Google solution achieved its performance: it pre-generates optimized code for object serialization; I've written some serialization by hand and know that it is possible to write pretty fast code this way if you avoid reflection. But Marc's library is a runtime solution that uses attributes and doesn't produce any generated code. So how does it work ?

+17  A: 

protobuf-net uses a strategy pattern; as needed (once only per type) it uses reflection to look at the types, and builds a set of serializers (based on a common interface) that it can use to serialize and deserialize - so when in use it is just stepping through the known set of serializers.

Inside that, it tries to make sensible use of reflection when talking to members; it uses Delegate.CreateDelegate to talk to properties, and DynamicMethod (and custom IL) to talk to fields (when possible; it depends on the target framework). This means that it is always talking to known delegate types, rather than just DynamicInvoke (which is very slow).

Without going mad, the code does have some optimisations (arguably at the expense of readability) in terms of:

  • local byte[] buffering (of the input/output streams)
  • using fixed-size arrays (rather than lists etc); perhaps too much
  • using generics to avoid boxing
  • numerous tweaks/twiddles/etc around the binary processing loops

In hindsight, I think I made a mistake on the generics point; the complexity meant that forcing generics into the system bent it out of shape in a few places, and actively causes some major problems (for complex models) on compact framework.

I have some designs (in my head only) to refactor this using non-generic interfaces, and to instead (for suitable frameworks) make more use of ILGenerator (my first choice would have been Expression, but that forces a higher framework version). The problem, however, is that this is going to take a considerable amount of time to get working, and until very recently I've been pretty swamped.

Recently I've managed to start spending some time on protobuf-net again, so hopefully I'll clear my backlog of requests etc and get started on that soon. It is also my intention to get it working with models other than reflection (i.e. describing the wire mapping separately).


and doesn't produce any generated code

I should also clarify that there are two (optional) codegen routes if you want to use generated code; protogen.exe, or the VS add-in, allow code generation from a .proto file. But this is not needed - it is useful mainly if you have an existing .proto file, or intent to interoperate with another language (C++ etc) for contract-first development.

Marc Gravell
Hi Marc thank you very much a detailed answer sorry I did not respond sooner. I think you did a great job with this project. We are considering using it as serialization mechanism for our production systems.
MichaelT