views:

486

answers:

5

I have a program that is very heavily hitting the file system, reading and writing to a set of working files. The files are several gigabytes in size, but not so large as to not fit on a RAM-disk. The machines this program runs on are typically Ubuntu Linux boxes.

Is there a way to configure the file manager to have a very very large cache, and even to cache writes so they hit the disk later?

Or is there a way to create a RAM disk that writes-through to real disk?

+3  A: 

Have you checked whether your disks are using their built-in write cache? It can make a considerable difference. On Linux, you can toggle the behaviour with hdparm:

/sbin/hdparm -W 0 /dev/hda 0 Disable write caching
/sbin/hdparm -W 1 /dev/hda 1 Enable write caching

Obviously, if write caching is enabled, then there is the potential for data loss or corruption if your system shuts down uncleanly (e.g. power cut).

In terms of software, the Linux kernel uses two main numbers to parameterize the write behaviour.

/proc/sys/vm/dirty_ratio
/proc/sys/vm/dirty_background_ratio

Modern defaults are to write more frequently, to avoid huge write spikes. You could try tuning these to suit your needs. Here is an excellent discussion of the available parameters and how you might try adjusting them.

ire_and_curses
+3  A: 

By default, Linux will use free RAM (almost all of it) to cache disk accesses, and will delay writes. The heuristics used by the kernel to decide the caching strategy are not perfect, but beating them in a specific situation is not easy. Also, on journalling filesystems (i.e. all the default filesystems nowadays), actual writes to the disk will be performed in a way which is resilient the crashes; this implies a bit of overhead. You may want to try to fiddle with filesystem options. E.g., for ext3, try mounting with data=writeback or even async (these options may improve filesystem performance, at the expense of reduced resilience towards crashes). Also, use noatime to reduce filesystem activity.

Programmatically, you might also want to perform disk accesses through memory mappings (with mmap). This is a bit hand-on, but it gives more control about data management and optimization.

Thomas Pornin
Mounting the relevant filesystems with `noatime` is definitely good advice for this case.
caf
+1  A: 

You could create a ramdisk and RAID-1 it with a physical partition. Look at the --write-mostly and --write-behind options. You can use those to make the physical disk one which is not to be read from (only written to), and to set the number of outstanding write operations, respectively.

Alternatively, look at the documentation for pdflush. There's a good page here: http://www.westnet.com/~gsmith/content/linux-pdflush.htm (also linked by ire_and_curses) Beyond what ire mentioned, you'll probably want to crank swappiness up to 100 to favor disk cache over swap.

But it'd be worthwhile to learn how it all works, and tune it to your specific app. Linux is already tuned for the general case, and only you know how your specific situation differs. :)

dannysauer
A: 

May be You need RAM-based SSD disk?

vitaly.v.ch
A: 

The question here really is how much durability do you require?

Normally Linux will happily use as much ram as there is to cache files for a while, then write the changes back. This is normally what you want so you will lose some, but not too much, data in the event of a crash.

Applications can of course force a write back with (for example) fdatasync() and fsync().

In order to get better performance, you could call fdatasync less often, for example, sacrificing durability.

MarkR