views:

483

answers:

3

We a have a GDBM key-value database as the backend to a load-balanced web-facing application that is in implemented in C++. The data served by the application has grown very large, so our admins have moved the GDBM files from "local" storage (on the webservers, or very close by) to a large, shared, remote, NFS-mounted filesystem.

This has affected performance. Our performance tests (in a test environment) show page load times jumping from hundreds of milliseconds (for local disk) to several seconds (over NFS, local network), and sometimes getting as high as 30 seconds. I believe a large part of the problem is that the application makes lots of random reads from the GDBM files, and that these are slow over NFS, and this will be even worse in production (where the front-end and back-end have even more network hardware between them) and as our database gets even bigger.

While this is not a critical application, I would like to improve performance, and have some resources available, including the application developer time and Unix admins. My main constraint is time only have the resources for a few weeks.

As I see it, my options are:

  1. Improve NFS performance by tuning parameters. My instinct is we wont get much out of this, but I have been wrong before, and I don't really know very much about NFS tuning.

  2. Move to a different key-value database, such as memcachedb or Tokyo Cabinet.

  3. Replace NFS with some other protocol (iSCSI has been mentioned, but i am not familiar with it).

How should I approach this problem?

+2  A: 

This appears to not be what you want to hear, but honestly, if I were you I'd throw it in a mysql table. It's not as if it's meaningfully harder to work with, and you get a lot of benefits with it, not least a remote access protocol that's actually intended for your situation, unlike GDBM-over-NFS.

chaos
+5  A: 

Don't get too hung up on the “relational versus non-relational” comparison. It appears to be irrelevant for this issue.

The line your application has crossed is a different one: from a small database on local fast file storage, to a large database accessed over the network. Crossing that line means you are now better served by a dedicated, network serviced, database management system. Whether the management server manages relational databases isn't relevant for that aspect.

For getting it up and running quickly, MySQL is probably your best bet. If you foresee it growing much beyond where it is now, you might as well put it in PostgreSQL since that's where it will need to go eventually anyway :-)

bignose
+1  A: 

If you want to stick to non-relational databases you could try BDB or DJB's CDB. I have used both so far and i think when it comes down to performance they outperform GDBM.

But keep bignose's answer in mind as i, too, think that your bottleneck might not be the data-structure (GDBM) you are using but your infrastructure.

tr9sh