views:

83

answers:

3

I have created a package using Moose and I would like to nstore some large instances. The resulting binary files are very large (500+MB) so I would like to compress them.

What is the best way for doing that? Should I open a filehandle with bzip etc. then store using fd_nstore?

+2  A: 

Have a look at Data::Serializer. It optionally uses zlib (via Compress::Zlib) or PPMd (via Compress::PPMd) to compress your serialized data.

Pedro Silva
+5  A: 

With MooseX::Storage, most of this is already done for you -- you just need to specify your serialization and I/O formats.

Ether
The doc is really minimal. How can I combine it with e.g. gzip?
David B
@David: I've only used MX:S myself in very superficial cases, but if you hop on perl.irc.org #moose there is almost always someone around who can help you out.
Ether
+3  A: 

While compression is certainly a viable option, you might also want consider simply serializing less.

Could it be that your objects contain a lot of data that could easily be rebuilt from other data they also contain? For example, if you have attributes that are lazily build from other attributes (e.g. using Moose's lazy + builder or lazy_build), there is not much point in storing the values of those attributes at all unless the recomputation is incredibly expensive. And even then it might be worth considering, as reading lots of data off disk isn't the fastest thing either.

If you find that you want to serialize only parts of your objects, and still want to use Storable, you can define custom STORABLE_freeze and STORABLE_thaw hooks, as described in the Storable documentation.

However, there's also alternative serializers available. MooseX::Storage is one of them, and happens to support many serialization backends and formats, and can also be told easily about which attributes to serialize and which to skip for that purpose.

rafl
This is related to `FastRanges` described here http://stackoverflow.com/questions/3790166 (see last update in the original post). The size is inherent to get the needed performance. It still much faster to load then to recreate the object.
David B