views:

71

answers:

0

I need your help on the design of the restore-procedure in a backup system I'm building. The prerequisites are the following:

  • A restore can be made of one or several files.
  • A file consists of 1 or more data-blocks (on average 16mb in size, minimum 4mb maximum 64mb).
  • A data-block is replicated using erasure coding into replication-blocks (256KB big).
  • The replication blocks are spread out across lots of unreliable nodes (i.e. you can never expect a response from any of the nodes).
  • Each node will store at most one replication-block for a data-block and all files will have basically the same set of nodes who store their data-blocks.
  • It's written in Java using Apache Mina as network library. If you are not familiar with Apache Mina then pseudo-code will do just fine! The only thing you need to know is that Apache Mina uses asynchronous message passing.

My question is: how should I design the restore of the files? I'm looking for a general design suggestions like, what should be done concurrently (files, data-blocks or replication-blocks), what patterns I should use (consumer-producer, just a big thread-pool with tasks, something else?). Any tips and suggestions are appreciated! Potential problems which the design needs to avoid:

  • Too many requests for replication-blocks are sent simultaneously so that either the remote nodes become overloaded or you run out of memory because the queue becomes too big.
  • It sends too few requests simultaneously and is therefore not using all the available bandwidth.
  • You run out of resources because too many threads are running simultaneously.

Other requirements are:

  • If not enough nodes provide a response in order for the data-block to become restored then resends of the requests must be made after a certain timeout.
  • You must be able to track the progress of a restore so that the user knows what's happening.
  • Since all nodes store one replication-block for each data-block but all files (and thus also data-blocks) share pretty much the same nodes, it is desirably to have good concurrency for the data-blocks as well. Otherwise you spend too much time waiting between requests for nodes with high bandwidth (since you can only request one replication-block per data-block and node).

I have a couple of ideas on how to implement it but I rather see your guys ideas' before affecting them with my bias.