views:

377

answers:

3

A common technique for storing a lot of files/blobs in a filesystem is to use a hash function to determine the filepath; eg hash(identifier) -> "o238455789" -> o23/8455/789 (there is often a hash-collision strategy too)

Does this technique have a name (is it a 'pattern'?) so that I may find it with a search of ACM Digital Library or similar online database of computing literature.

Are there any books/papers that explore the problem/solution?

PS thanks for the helpful notes - but none address the technique given above.

A: 

Hi,

This sort of sounds like sharding, but I am probably missing the subtleties.

And equally I don't see many articles on it - a few on highscalability.com

Chris Kimpton
A: 

@Chris Kimpton

This would be called indexing. Sharding or partitioning is more about how to split a file.

Loki
+3  A: 

I think this is what microsoft has done in SQL Server 2008 with FILESTREAM storage. It allows storage of BLOB data inside of SQL Server, but allows you to access the files directly off the disk, which gives you kick-ass performance.

Microsoft released a whitepaper on managing unstructured data that you may be interested in. THere's also an MSDN article describing FILESTREAM as well as the pros & cons of file storage & whether to BLOB or not to BLOB

Nick Kavadias