I'm considering using Berkeley DB to cache some data on an application cluster. What's a reasonable upper limit on the number of nodes I can plan on Berkeley DB handling? Writing to the database will be from a single node.
Mark,
Most of our customers are using replication groups of 5-20 nodes, although we have some large customers running with much larger replication groups. There is no inherent limit built into Berkeley DB.
The real world limit will depend on your read/write workload mix, how you configure your replication system and the amount of CPU cycles available on the master system. Basically, the master needs to communicate with each replica (send log records, process acknowledgements, respond to requests, etc.). Each replica that communicates with the master adds a small amount of overhead. For a mostly read/occasionally write workload, the master doesn't have to communicate that often and communicating with the replicas requires minimal processing. On a predominantly write workload, the master is actively communicating with the replicas and incurs a more significant workload per replica. You can reduce the workload on the master by funneling read operations to the replicas and by utilizing the Berkeley DB HA client-to-client synchronization feature.
Your mileage will vary, so the best approach is to test a prototype of your application and evaluate the balance of throughput, application requirements and available CPU cycles. Do you have a sense of how many nodes you expect to need in your replication groups?
Regards,
Dave
PS: The Getting Started with Replication Guide is a good place to start.