views:

180

answers:

4

My PHP project will use thousands of pictures and each needs only a single number for it's storage name.

My initial idea was to put all of the pictures in a single directory and name the files "0.jpg", "1.jpg", "2.jpg", and all the way to "4294967295.jpg" .

Would it be better performance-wise to create a directory tree structure and name the files something like "429 / 496 / 7295.jpg"?

If the answer is yes, then the follow up question would be: what is the optimal amount of subdirs or files per level of depth? And what effect does the chosen filesystem have on this?

Each file will have a corresponding MySQL entry with an UNSIGNED LONGINT id-number.

Thank you.

+1  A: 

Having several thousands files in one directory will slow things down considerably. I'd say a safe number is up to 1024 files per directory, 512 even better.

Drakosha
+1, but you'd be better using the scheme proposed by the user, 100 entries per level from 00/00/00/00/00.jpg through to 42/94/96/72/95.jpg. That will make it a lot easier to locate/place your files.
paxdiablo
assuming what file system? and why 1024 / 512? magic hat?ext2/3/4 with dir_index option will use a hashed B-tree implementation for locating files, the fastest solution will be all files in one directory with this option enabled.
Mark
I answered making no assumptions. I think your comment qualifies for a full answer :).
Drakosha
A: 

The answer, of course, is: It depends.

In particular, it depends on which file system you use. For example, the ext2 and ext3 file systems have a limits to the number of files per directory. Those file systems would not be able to put all of your pictures in one directory!

You might look into something other than a file system. In the company I work for, because we needed to store lots of material, we moved from file-based storage to a database-based storage run on Apache Jackrabbit.

Chip Uni
That's not correct, @Chip, the number of *subdirectories* is limited to 32,000-ish, files can go up to much higher numbers.
paxdiablo
I didn't realise there was an option like database-based storage. I will look into this. Thanks.
orz
@pax, files are also limited but not to 32k. http://en.wikipedia.org/wiki/Ext3#cite_note-0
Chuck Vose
Thank you, Chuck and pax.
Chip Uni
+2  A: 

Yes, hard-to-say, quite a bit, perhaps you should use a database

The conventional wisdom is "use a database", but using the filesystem is a reasonable plan for larger objects like images.

Some filesystems have limits on the number of directory entries. Some filesystems do not have any sort of data structure for filename lookups, but just do a linear scan of the directory.

Optimizations like you are discussing are restricted to specific environmental profiles. Do you even know right now what future hardware your application will run on? Might it be a good idea to not stress the filesystem and make a nice, hierarchical directory structure? If you do that it will run well on any filesystem or storage server.

DigitalRoss
+1  A: 

It depends on which filesystem is being used. ext{2,3,4} have a dir_index option that can be set when they are created that make storing thousands or even millions of files in a single directory reasonably fast.

btrfs is not yet production ready, but it implicitly supports this idea at a very basic level.

But if you're using the ext series without dir_index or most other Unix filesystems you will need to go for the more complex scheme of having several levels of directories. I would suggest you avoid that if you can. It just adds a lot of extra complication for something filesystems ought to be handling reasonably for you.

If you do use the more complex scheme, I would suggest encoding the number in hex and having 256 files/directories at each level. Filesystems that aren't designed to handle large numbers of files in each directory typically do linear scans. The goal is to approximate a B-Tree type structure on your own. 2 hex digits at each level gives you about half a 4kiB (a common size) disk block per level with common means of encoding directories. That's about as good as you're going to get without a really complicated scheme like encoding your numbers in base 23 or base 24.

Omnifarious
I would argue strongly that you should always use some sort of hierarchical encoding scheme, with a bounded number of entries per directory regardless of the underlying file system. A directory with thousands or 10s of thousands of entries in it is impossible for *you* to work with, let alone the file system. Simple tools like ls stop working effectively.
Dale Hagglund