views:

69

answers:

3

What is the relative overhead of calling times() versus file operations like reading a line fread().

I realize this likely differs from OS to OS and depends on how long the line is, where the file is located, if it's really a pipe that's blocked (it's not), etc.

Most likely the file is not local but is on a mounted NFS drive somewhere on the local network. The common case is a line that is 20 characters long. If it helps, assume Linux kernel 2.6.9. The code will not be run on Windows.

I'm just looking for a rough guide. Is it on the same order of magnitude? Faster? Slower?

Ultimate goal: I'm looking at implementing a progress callback routine, but don't want to call too frequently (because the callback is likely very expensive). The majority of work is reading a text file (line by line) and doing something with the line. Unfortunately, some of the lines are very long, so simply calling every N lines isn't effective in the all-too-often seen pathological cases.

I'm avoiding writing a benchmark because I'm afraid of writing it wrong and am hoping the wisdom of the crowd is greater than my half-baked tests.

+3  A: 

fread() is a C library function, not a system call. fread(), fwrite(), fgets() and friends are all buffered I/O by default (see setbuf) which means that the library allocates a buffer which decreases the frequency with which read() and write() system calls need to be made.

This means that if you're reading sequentially from the file, the library will only issue a system call every, say, 100 reads (subject to the buffer size and how much data you read at a time).

When the read() and write() system calls are made, however, they will definitely be slower than calling times(), simply due to the volume of data that needs to be exchanged between your program and the kernel. If the data is cached in the OS's buffers (e.g. it was written by another process on the same machine moments ago) then it will still be pretty fast. If the data is not cached, then you will have to wait for I/O (be it to the disk or over the network), which is very slow in comparison.

If the data is coming fresh over NFS, then I'd be pretty confident that calling times() will be faster than fread() on average.

Artelius
A: 

times() simply reads kernel maintained process-specific data. The data is maintained by the kernel to supply information for the wait() system call when the process exits. So, the data is always maintained, regardless of whether times() ever gets called. The extra overhead of calling times() is really low

fread(), fwrite(), etc call underlying system calls - read() & write(), which invoke drivers. The drivers then place data in a kernel buffer. This is far more costly in terms of resources than invoking times().

Is this what you are asking?

jim mcnamara
+1  A: 

On Linux, you could write a little program that does lots of calls to times() and fread() and measure the syscall times with strace -c

e.g

for (i = 0; i < RUNS; i++) {
        times(&t_buf);
        fread(buf,1,BUF,fh);
}

This is when BUF 4096 (fread will actually call read() every time)

# strace -c ./time_times
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 59.77    0.001988           0    100000           read
 40.23    0.001338           0     99999           times

and this is when BUF 16

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.00    0.001387           0     99999           times
  1.00    0.000014           0       392           read
miedwar