tags:

views:

285

answers:

3

Hi,

I having been googling for a way to do raw (sometimes called direct) i/o under mac os. Raw i/o turns of the operating system page cache to give the application more direct access to the disk. This is useful because some of the filestructure I am using are not efficient using LRU page replacement. It is fairly straight forward to implement the page replacement algorithms we need but first we need to turn off os x default buffering. We have already done this under linux using the O_DIRECT flag when opening files. Does anyone know how to turn off page buffering under mac os?

Cheers Tim

+1  A: 

You need to use open instead of fopen and I believe to really have raw access you have to read from /dev/rdisk? directly.

Georg
I know when using this approach under linux you have to first mount the device using the "raw" command which turns the block device into a raw character device. Do you have to play similar games under os x?
tim.tadh
Not that I can think of. For each block device there are two devices in the system, one is e.g. */dev/disk0* and the other */dev/rdisk0*
Georg
Using open (instead of fopen) avoids the buffering in the C library, but does it totally separate form how the OS's page cache operates. As noted below F_NOCACHE is the way to go.
benno
+1  A: 

You may want to use the madvise system call. You can give hints to the kernel that about which pages to flush first by using MADV_DONTNEED or MADV_WILLNEED. OS X also supports an mmap flag MAP_NOCACHE, which instructs the kernel to discard the resulting pages first.

Dietrich Epp
+1, O_DIRECT almost did not make it into Linux because Linus was worried that it would not get the appropriate POSIX advisory hooks any exposure, which ended up happening. Most people just go right for O_DIRECT instead of using madvise / fadvise. The only sane reason for using O_DIRECT is when writing a RDBMS, or similar, that handles 100% of its own buffering.
Tim Post
I should note I am writing a DBMS. This is an "academic" project, ie. as part of a senior project. As part of the project we are doing empirical work on the recommended buffering schemes for the various file-structures we are using, ie. B+ Trees, ISAM, Linear Hashing, which are pretty standard but we are also implementing some more exotic things like B-Tries. Each structure comes with its own advisements on which buffering scheme works the best, as part of the project I aim to test some of these claims along the way.
tim.tadh
+1  A: 

After some more reading through the man pages I finally found the ideal answer. It turns out mac os actually has very similar mechanism to O_DIRECT, however it is not through the open function it is through fcntl. Specifically there is an option called F_NOCACHE which allows you to turn the cache on or off for a particular file descriptor which is exactly what I wanted. See http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man2/fcntl.2.html for the full run down of the other things you can do with the mac version of fcntl, and explanation of its exact use. I hope this answer will help someone else out.


http://lists.apple.com/archives/filesystem-dev/2007/Sep/msg00010.html Is a good thread that explains how the F_NOCACHE flag behaves depending on your mac os version number.


Final Code (in go):

    r1, r2, err := syscall.Syscall(syscall.SYS_FCNTL, uintptr(self.file.Fd()), syscall.F_NOCACHE, 1)
    if err != 0 {
        fmt.Printf("Syscall to SYS_FCNTL failed\n\tr1=%v, r2=%v, err=%v\n", r1, r2, err)
        self.Close()
        return false
    }
tim.tadh