views:

163

answers:

2

I have to do lots of small random accesses to a whole bunch of files. I have more than enough main memory to hold all of the data.

When I copy the data over to a temporary ramfs filesystem and process it there, this takes only a small fraction of the time that waiting for disk access would take.

Is there a Linux file system which holds all of its data in main memory, writes any changes to a backing disk, but never touches the disk for any reads?

If not, can, say, ext3 caches be tuned so that they are guaranteed to hold 100% of data and metadata?

+4  A: 

If you are only reading data, then you can indeed tune caching such that all data will be cached in RAM - see /usr/src/linux/Documentation/sysctl/fs.txt vm.txt for details of what you can tweak here. The problem arises when you write data, particularly if you use fsync() or similar to ensure the data has been commited to the actual disk.

As the OS has to update the disk in the case of a fsync(), there's not much you can do if you still want to ensure your data is consistant and wouldn't be lost in a power cut.

One problem you might be running into is the atime or access time - by default every time a file is accessed the access time is updated in the inode. This will cause disk writes even when you think you are just performing reads. This can be a particular problem in your scenario where you are accessing many small files. If you don't care about tracking the access time you can mount your filesystem with the noatime to disable this 'feature'.

Dave Rigby
Doesn't look fs.txt is relevant to this question, unless I missed what you specifically meant in there?
mikaelhg
@mikaelhg: Sorry, made a mistake there - I meant 'sysctl/vm.txt' which has various knobs you can tweak, such as vfs_cache_pressure. I'll update the answer.
Dave Rigby
A: 

Why don't you try to create a RAID mirror between a ramdisk and a physical disk ?

Not sure if it's efficient though. If the mirror must always be synchronized, it will have to wait for the disk anyway when you write, but for reading you should gain something. But yeah, to me it looks very much a complicated, wheel reinvented square IO caching :)

Would be a nice experiment, though.

Stefano Borini
This ended up being the proper answer, with mdadm --write-mostly.
mikaelhg
@mikaelhg : do you have benchmarks ?
Stefano Borini