Filesystem seek performance with lots of tiny files

views:

553

answers:

+2 Q:

Filesystem seek performance with lots of tiny files

I'm looking to build a server with lots of tiny files delivered by an XML API. It won't be doing a whole lot of iterating over directories or blocks of sequential files--we're talking lots and lots of seeks for discontinuous data.

Will seek time on BSD UFS degrade over time for requests for individual files? I understand that the filesystem's inode limit is based on the size of the partition/slice, but the hard drive has to step through the inode table for every file request before it can discover the location of the data. What filesystem yields the best performance for seek time?

The alternative is to setup 2-4GB "blob" files and have a separate system of seeking a file contained in them from within the software. The software's "inode table" could be optimized for delivery based on currently logged in user, etc... These "inode tables" would likely be cached in RAM and would only relate to the users currently logged in so that there are fewer wasted resources.

Where do these two solutions rate on a scalability and maintenance standpoint? What sort of performance gains, if any, could I expect by using the second solution?

+1 A:

I'm not sure i understand you question correctly, but if you want to seek over lots of files, why not use a partioned mysql table laid out on a RAID0 or VFS filesystem?

Edit: as far as i know, lots of files in one folder will degrade any FS speed as it has to maintain bigger lists of files, permissions and names, a database is designed to keep lists of data in memory and seek in a very optimized way through it.

Quamis 2009-01-11 09:30:30

More details of your situation would be helpful, are the files existing or would they be created by your application? If you need a way to store arbitrary data with out the structure of a relational database have you looked at object databases

Jared 2009-01-11 09:58:40

These will be new files.Two goals from my approach will be to minimize file seek time and to make backups as easy and space-efficient as possible.

Nolte Burke 2009-01-12 05:13:17

Another option, if your objects should or can be accessed via HTTP, is to use a varnish cache in front of a small web server. Initially objects would be stored on disk, but varnish would store and serve objects from memory after the first access to a given object.

wulong 2009-01-11 21:00:00

We're already caching HTTP requests via Squid. Good suggestion though :)

Nolte Burke 2009-01-12 05:11:41

varnish is better at keeping everything in memory so you rarely hit the filesystem. When it does, it uses its own virtual memory format so you won't run into directory size limits.

wulong 2009-01-12 17:18:38

+3 A:

The most obvious and time-proven mitigation technique is to use a good hierarchical design for directories (and pathname search strategies), and have more directories with fewer files in each.

le dorfier 2009-01-11 21:02:21

I fully agree with that.

ariso 2009-05-02 02:47:33

+3 A:

For recent FreeBSD versions with dirhash and softupdates I have seen no problems with a few ten thousand files per directory. You probably don't want to go north of 500.000 files or so. E.g. deleting a directory with 2.500.000 files took me three days.

mdorseif 2009-01-23 07:15:13

Yowch! That's a long delete operation. I bet the machine was unusable the whole time.

Nolte Burke 2009-01-24 10:19:49

No, he machine was actually doing fine and serving files via SMB to 40 or so users.

mdorseif 2009-01-26 06:58:19

ansaurus

tags:

views:

answers:

Filesystem seek performance with lots of tiny files

related questions