views:

44

answers:

1

Hello All,

I am currently partitioning the UUID space using modulo, so data lookup doesn't require pinging every single server. However, the main problem with modulo is scaling because adding more nodes to the datastore probably requires some data migration. In your opinion, what is the best approach to add more nodes to the system while keeping the service available?

Thanks in advance!

Chris

A: 

I can see two scenarios for adding more nodes:

  1. Your current set of nodes is insufficient for your performance requirements.
  2. You are adding new data, so over time you need to add new nodes

In the second case, if the UUIDs have some time-based component then partitioning by age may be good enough

But I'd guess that it's the first case that's more interesting. As the usage of your data chnages you find that the partitioning needs to change in order to distribute the work sufficiently. In that case I don't see any alternative to moving some partitions to new nodes, by defintion the current se of nodes are overloaded. I'm not clear as to the extent of the problem - is it the actual migration that's troublesome? Or is it that the clients suddenly need to go to a new place for their data?

Can you implement some kind of lazy lookup approach?

Client initialises with current table of servers, 
    (hence knows the modulus to apply to uuid and can route on that basis)

Servers decide to reorganise themselves

Client gets a request, calclulates a now invalid modulus

Attempts to access the data

response says "Gone Away" **AND** gives the new server info

Client can now recompute the location
djna