views:

88

answers:

2

There are millions of user accounts, and I want to distribute their data into N tables(user_1, user_2,..., user_N) of a database. User accounts are comprised of 3~8 characters. So, I want a function that returns table suffix like

  int getTableSuffix(String userAccount);

The result is a uniform distribution from 1 to N.

Do you know any cheap hash algorithm for this job?

+1  A: 

You could take the ascii value of the first 1-3 characters and find a product of those in order to return your number.

Alternatively, you could actually use the characters as your table prefix, eg. Users_AA, Users_AB, etc.

However, what database are you using for this data? In most modern databases you should have no need to create multiple tables to store the same data. Even with millions of records. Good indexing on your table should be more than enough to solve any performance issues you may have.

Robin Day
A: 

Its not clear whether you are looking for a string hash function, or a method for partitioning based on strings.

A good string hash function uses each character, and accounts for the position of the characters. For example, djb2 uses something like this (pseudo-code):

hash = 5381
foreach (ch in str) 
  hash = hash * 33 + ch

Whatever your hash is, partition by the number of tables using a modulo operation:

table = hash % count

I recommend using the built-in partitioning capability of your database, if it has one.

erickson