views:

85

answers:

1

I'm about to create a user based website and will have to store photo, docs and other data for each user.

If I take a silly number like 1 000 000 000 users, I believe than one folder with 1 000 000 000 won't be the fastest thing in the world! So I was thinking of creating something like

1st level : [a-z] 2nd level : [a-z] 3rd level : [a-z]

Therefor bobby will be in /b/o/b/by

But this also mean that it won't be spread equaly, because there will be very few user starting with a z and many more with a m,s,l ...

so I was thinking of using a user id such as "000000000001", "000000000001" etc...

1st level : [000-999] 2nd level : [000-999] 3rd level : [000-999]

therefore data of the user 000000000001 will be store in /data/000/000/000/001 then I will be sure to have a maximum of 1000 folder in each level.

What do you guys think about it, what I should do or not do ?

The server will be running Centos 5.4 with EXT3 on raid 1, if the I/O get's too bad i will probably go for a raid 10.

A: 

A hash function provides a way to distribute large amounts of data across an easily searchable structure.

See this related question: http://stackoverflow.com/questions/338880/why-use-hashing-to-create-pathnames-for-large-collections-of-files

And also try looking through Google results for Directory Hashing.

BenV