views:

51

answers:

1

I am considering the option of neo4j for some of the new projects I am working for. For the given data needs (inherently graph based) neo4j fits well and a quick prototype is giving good response time for me. What I want to understand is how to scale a neo4j deployment. Specifically:

  • How do I shard my data across neo4j deployments. Since neo4j is deployed on a single machine, there is a limit to how much data I can store in a single machine and hence I would like to know how to distribute it. Clearly if I split it on users, then relationships between disconnected users (across shards) cannot be maintained.
  • How do I replicate the neo4j data? I am potentially thinking of putting up a sql-like-setup with masters used for write and slaves used for reads so that we can both scale up our potentially readers and writers, but also have a backup of our data in real time. I understand that all the neo4j data is stored in a filesystem - which is inherently non-replicatable. Is there a way I can do it here? Perhaps, something akin to a mysql bin log?
+1  A: 

Hi there, sharding is as of now not handled by Neo4j itself, but by the domain, much as you describe. Neo4j 2.0 is going to target that problem.

For replication, Online Backup is working and real High Availability with Master failover is in the works, using ZooKeeper to track the cluster nodes and elect new masters, etc.

Any more details on your app sharding requirements? What domain etc?

Shreeni
I looked at the "Online Backup" documentation and though it shows various scenarios - but I am still unclear if I can do the standard master-slave sort of arrangement. a real HA with Master failover would be good, but would a non-failover scenario or one master for write and multiple slaves for read be met with the current system?
Shreeni
Shreeni,yes, sharding is done on domain level, and you hold the references to the different shards in your domain and manage the references between them.With online backup, you will have a setup with a Master and a "hot spare" that you can switch in as the master if the primary instance goes down. Maybe you could even ask on the list for more details?
Thanks peter. Thats good enough for me to get started.
Shreeni