Does anyone out there have any great ideas to achieve a massively scalable hierarchical datastore? It needs rapid add and ability to have many users of site requesting reports on the number of nodes below a certain node in hierarchy.
This is the scenario....
I will have a very large number of nodes getting added per hour. Lets say I want to add 1 million nodes per hour. They will likely be appearing all over the hierarchy. Ideally the scale will be into the billions of nodes but 50 million is a target to aim for. I need to be able to calculate at any time the number of nodes below any given point and there will likely be many people doign this at the same time. Think of it as a report that many users (100,000 concurrent perhaps) will be calling for at any one time. they might request all nodes below a certain node.
The database could either be created by a single process reading out of a flat table formatted as an adjacency list (rapid inserts, slow reporting) or it could be a standard design where users of the web site are updating the hierarchy directly if the datastore exists to cope with the massive number of nodes being created.
I already have this implemented in Django using Treebeard and MySQL. I am using a Materialised Path method and it is fairly good but I want lightning speed in comparison. With a datastore of 30,000 nodes I am achieving 120 inserts at the bottom of the tree per minute running on a 2 year old laptop. I want a lot more than this obviously and think that maybe there is a better datastore to use. Maybe PyTables, BigTable, MongoDB or Cassandra?
Easy integration into Python/Django would be good but I can always write this part of the system in another language if I have to. If we used the single process read out of flat datastore and process into a really efficient hierarchical datastore which will be perfect for reporting, I guess I will have no concurrency issues that will negate the need for transactions.
Anyway, that's enough info to get us started. Is this easy using the right technology?