I need your help on the design of the restore-procedure in a backup system I'm building. The prerequisites are the following:
- A restore can be made of one or several files.
- A file consists of 1 or more data-blocks (on average 16mb in size, minimum 4mb maximum 64mb).
- A data-block is replicated using erasure coding into replication-blocks (256KB big).
- The replication blocks are spread out across lots of unreliable nodes (i.e. you can never expect a response from any of the nodes).
- Each node will store at most one replication-block for a data-block and all files will have basically the same set of nodes who store their data-blocks.
- It's written in Java using Apache Mina as network library. If you are not familiar with Apache Mina then pseudo-code will do just fine! The only thing you need to know is that Apache Mina uses asynchronous message passing.
My question is: how should I design the restore of the files? I'm looking for a general design suggestions like, what should be done concurrently (files, data-blocks or replication-blocks), what patterns I should use (consumer-producer, just a big thread-pool with tasks, something else?). Any tips and suggestions are appreciated! Potential problems which the design needs to avoid:
- Too many requests for replication-blocks are sent simultaneously so that either the remote nodes become overloaded or you run out of memory because the queue becomes too big.
- It sends too few requests simultaneously and is therefore not using all the available bandwidth.
- You run out of resources because too many threads are running simultaneously.
Other requirements are:
- If not enough nodes provide a response in order for the data-block to become restored then resends of the requests must be made after a certain timeout.
- You must be able to track the progress of a restore so that the user knows what's happening.
- Since all nodes store one replication-block for each data-block but all files (and thus also data-blocks) share pretty much the same nodes, it is desirably to have good concurrency for the data-blocks as well. Otherwise you spend too much time waiting between requests for nodes with high bandwidth (since you can only request one replication-block per data-block and node).
I have a couple of ideas on how to implement it but I rather see your guys ideas' before affecting them with my bias.