views:

118

answers:

2

Prelude

sendfile() is an extremely useful syscall for two reasons:

First, it's less code than a read()/write() (or recv()/send() if you prefer that jive) loop.
Second, it's faster (less syscalls, implementation may copy between devices without buffer, etc...) than the aforementioned methods.

Less code. More efficient. Awesome.

In UNIX, everything is (mostly) a file. This is the ugly territory from the collision of platonic theory and real-world practice. I understand that sockets are fundamentally different than files residing on some device. I haven't dug through the sources of Linux/*BSD/Darwin/whatever OS implements sendfile() to know why this specific syscall is restricted to writing to sockets (specifically, streaming sockets).

I just want to know...

Question

What is limiting sendfile() from allowing the destination file descriptor to be something besides a socket (like a disk file, or a pipe)?

+3  A: 

I seem to remember that it was a limitation introduced in early Linux 2.6 (2.4 didn't have the limitation).

Since 2.6.17 Linux has the splice() system call which is similar; more flexible, but slightly less efficient. Linus talked about re-implementing sendfile in terms of splice(). See http://kerneltrap.org/node/6505

Dipstick
+2  A: 

Fundamentally, the only thing limiting it is that "no-one's written the code yet".

However, I gather that the reason that no-ones written the code for those two cases you mention is that they both would require the data to be copied, which removes much of the advantage of using sendfile in the first place.

  • For a file-to-file sendfile, you'd need a copy because otherwise the same page would have to be in the pagecache as both a clean page in the source file and a dirty page in the destination file. I don't think the pagecache is built to handle that case at the moment (though of course, this could be changed if there was sufficient motivation).

  • For a file-to-pipe sendfile, you need a copy regardless because the destination process needs to get a private, writeable copy of the data. Anyway, for most uses of this case we already have mmap.

caf