views:

54

answers:

1

I use KiokuDB to store a couple of Moose objects and a couple of simple array structures (hashes and arrays).

I do not need any fancy searches, transactions etc., simple the ability to fetch (lookup) an object. Also, as soon as I'm done creating the DB, it can be set read-only. No changes will ever be made to it.

The main (only?) reason I use KiokuDB is to keep object graph.

The largest object, which dominants the total size of the DB, is a Moose object that has a relatively large array in it (let's call this object large_obj). Previously, I stored large_obj (alone) using Storable + PerlIO::gzip or even JSON + PerlIO::gzip. It worked fine and I was very happy with the results (the use of gzip compressed the store file to about 5% of its original size).

There is another, smaller Moose object, which is basically an array of some 20-30k small Moose objects.

Now, after migrating to KiokuDB, I first used the simple Hash backend, then dumped it to a file (using Cmd) with PerlIO::gzip again. This worked very well in cases where large_obj was relatively small, but once it got larger, I just got out of memory errors. I guess the hash backed is not suitable for large objects.

I then tried the recommended Berkeley backend, although it seems like an overkill (as mentioned, I don't really need all the fancy DB capabilities). It works much slower than the original Storable + PerlIO::gzip solution, it takes far more space, and it also runs out of memory for larger objects! (I use a 3GB RAM ubuntu).

I also tried the Files backend, but it fails with:

Too many open files at /usr/local/perls/perl-5.12.2/lib/site_perl/5.12.2/Directory/Transactional.pm line 130.
    (in cleanup) Too many open files at /usr/local/perls/perl-5.12.2/lib/site_perl/5.12.2/Directory/Transactional.pm line 130.

Do you have any suggestions on how can I store my objects in a way that is both space-efficient and maintains the object graph?

+2  A: 

Implement your own backend using Data::Serializer:

package KiokuDB::Backend::Serialize::Data::Serializer;
use Moose;
use Moose::Role;

use Data::Serializer;

use namespace::clean -except => 'meta';

with qw(
    KiokuDB::Backend::Serialize
    KiokuDB::Backend::Role::UnicodeSafe
    KiokuDB::Backend::Role::BinarySafe
);

has '_serializer' => (
    is       => 'ro',
    isa      => 'Data::Serializer',
    required => 1,
    lazy     => 1,
    default  => sub {
        Data::Serializer->new(
            serializer => 'FreezeThaw', # Storable, FreezeThaw, Data::Denter, Config::General, YAML, PHP::Serialization, XML::Dumper, and Data::Dumper
            digester   => 'MD5', # See http://search.cpan.org/~gaas/Digest-1.16/Digest.pm#Digest_speed
            compress   => 1,
            compressor => 'Compress::Zlib', # Compress::Zlib or Compress::PPMd
        );
    },
);

sub serialize {
    my ( $self, $entry ) = @_;

    return $self->_serializer->serialize($entry);
}

sub deserialize {
    my ( $self, $blob ) = @_;

    return $self->_serializer->deserialize($blob);
}

sub serialize_to_stream {
    my ( $self, $fh, $entry ) = @_;

    $self->_serializer->store( $entry, $fh );
}

sub deserialize_from_stream {
    my ( $self, $fh ) = @_;

    $self->_serializer->retrieve($fh);
}

__PACKAGE__
Pedro Silva
a) I think `use Moose;` is obsolete. b) how should I create my `$kdb_dir?` `my $kdb_dir = KiokuDB->new( backend =>KiokuDB::Backend::Hash->new(serializer => "Data::Serializer"));` ?
David B