tags:

views:

129

answers:

2

Hey all,

I would like to use db4o as the backend of a custom cache implementation. Normally my program involves loading into memory some 40,000,000 objects and working on them simultaneously. Obviously this requires a lot of memory and I thought of perhaps persisting some of the objects (those not in a cache) to a db4o database. My preliminary tests show db4o to be a bit slower than I would like (about 1,000,000 objects took 17 minutes to persist). However, I was using the most basic setup.

I was doing something like this:

using (var reader = new FileUnitReader(Settings, Dictionary, m_fileNameResolver, ObjectFactory.Resolve<DataValueConverter>(), ObjectFactory.Resolve<UnitFactory>()))
using (var db = Db4oEmbedded.OpenFile(Db4oEmbedded.NewConfiguration(), path))
{
    var timer = new Stopwatch();
    timer.Start();
    IUnit unit = reader.GetNextUnit();
    while (unit != null)
    {
        db.Store(unit);
        unit = reader.GetNextUnit();
    }
    timer.Stop()
    db.Close();

    var elapsed = timer.Elapsed;
}

Can anyone offer advice on how to improve performance in this scenario?

+2  A: 

Hi

Well I think there are a few options to improve the performance in this situation.

I've also discovered that the reflection-overhead in such scenarios can become quite a large part. So you may should try the fast-reflector for your case. Note that the FastReflector consumes more memory. However in your scenario this won't really matter. You can the fast-reflector like this:

var config = Db4oEmbedded.NewConfiguration();
config.Common.ReflectWith(new FastNetReflector());

using(var container = Db4oEmbedded.OpenFile(config, fileName))
{
}

When I did similar tiny 'benchmarks', I discovered that a larger cache-size improves the performance also a little, even when you write to the database:

var config = Db4oEmbedded.NewConfiguration();
config.File.Storage = new CachingStorage(new FileStorage(), 128, 1024 * 4);

Other notes: The transaction-handling of db4o isn't really optimized for giant transactions. When you store a 1'000'000 objects in one transaction, the commit may take ages or you run out of memory. Therefore you may want to commit more often. For example commit after every 100'000 stored object. Of course you need to check if it really makes an impact for your scenario.

Gamlor
Interesting. I tried the FastNetReflector and it halved the amount of time required. However, I'm still not quite near my goal of 2 minutes per 1,000,000 records loading speed. The FastNetReflector took me down to about 8-9mins per 1,000,000. Any other suggestions
Jeffrey Cameron
Hmm, I don't know anything which would make it 4 times faster to reach your goal. I would need to investigate what the bottleneck is for that. Do you really need a complex object-database? Since you use it for caching there a probably more 'lightweight' solutions out there, which are way faster.
Gamlor
There are some others (see the comments above) but I thought I would try the db4o solution since it offered simplicity and robustness. Thanks though!
Jeffrey Cameron
+1  A: 

Another small improvement that you could try:

Get the extended interface by adding .Ext() to the OpenFile() call.

Purge every object after you stored it.

using (var db = Db4oEmbedded.OpenFile(Db4oEmbedded.NewConfiguration(), path).Ext())
// ....
db.Store(unit);
db.Purge(unit);
// ....

That way you will reduce the number of references that db4o has to maintain in the current transaction.

Probably you have the most potential for another big improvement if you play with the Storage configuration (That's the pluggable file system below db4o.) The latest 8.0 builds have a better cache implementation that don't degrade performance for cache maintenance when you work with larger numbers of cache pages.

I suggest you try the latest 8.0 build with the cache setup that Gamlor has suggested to see if that makes a difference:

config.File.Storage = new CachingStorage(new FileStorage(), 128, 1024 * 4);

If it does, you could try much higher numbers also:

config.File.Storage = new CachingStorage(new FileStorage(), 1280, 1024 * 40);
Carl Rosenberger
Thank you Carl, I'll give that a try
Jeffrey Cameron
Oddly enough, with the upgrade from 7.12 to 8.0 and the use of db.Purge (along with FastNetReflector and CachingStorage, as previously suggested) the program actually took longer ... :-/ I'm going to try it again without the Purge, see if that helps
Jeffrey Cameron
OK, Tried it without the db.Purge() call. IT looks like it is much better at managing memory but is still slower than 7.12
Jeffrey Cameron
db4o 8.0 has a new IdSystem by default, which is considerably faster for fragmented databases but it may be slightly slower for raw store operations where there are no fragmentation effects. The old PointerBasedIdSystem can still be used as follows: config.IdSystem.UsePointerBasedSystem();
Carl Rosenberger
Thanks Carl, I'll try that as well and let you know.
Jeffrey Cameron