ansaurus

Question

Why is my "cat" function with system calls slower compared to Linux's "cat"?

Answer 1

+1 A:

How much? The canonical cat is something like

char bufr[BUFSIZ];
ssize_t len;

while((len=read(fdin, bufr, BUFSIZ)) >0)
     write(fdout, bufr, len);

which saves a few instructions.

Charlie Martin 2009-04-20 18:41:33

This may be the canonical version, but an incorrect one (eg. if signal comes while you write())

jpalecek 2009-04-20 18:48:26

What part of "something like" did you miss?

Charlie Martin 2009-04-20 18:50:34

Like I said, original cat and my cat, both call one read() with a buffer size of 32768 and a write() with the same buffer size and a last read() at the end (when it doesn't read anything else and terminates).

Nazgulled 2009-04-20 18:55:14

Right, but your version has multiple tests == more instructions. However, it sounds like the real issue was contention for CPU cycles.

Charlie Martin 2009-04-20 19:21:56

Answer 2

+3 A:

Perhaps you compiled without optimization (or without as high an optimization setting)?

Also, your code will call sysWriteBuffer once with readBytes equal to zero -- maybe that (partially) explains it?

You might also inline sysWriteBuffer (either via a compiler switch or by hand).

"inlining" means to copy the body of a function to its call site in order to remove the overhead of calling a function. Sometimes compilers do this automatically (I think -O3 enables this optimization in gcc). You can also use the the inline keyword in gcc to tell the compiler to inline a function. If you do this, your declaration will look like this:

static inline int sysWriteBuffer(int fdout, char *buffer, ssize_t readBytes) {
....

Rick Copeland 2009-04-20 18:44:16

If you use strace on cat you'll see that it also happens there, so I just left it... And I'm using the -O2 flag.

Nazgulled 2009-04-20 18:53:31

You could try "-O3 -funroll_loops" and see how that does. Better yet would be to determine the exact flags with which cat was compiled.

Rick Copeland 2009-04-20 18:57:00

Just a note, the flap is -funroll-loops (second hyphen not an underscore), and I don't think it'll do a whole lot in this case anyway.

Tony k 2009-04-20 19:00:29

Thanks for the correction on the flag -- btw, it's "flag", not "flap" (don't you wish we could edit these comments?) :)

Rick Copeland 2009-04-20 19:04:54

What is "inlining" (does this word even exist?) sysWriteBuffer?

Nazgulled 2009-04-20 19:07:34

I have updated the answer to define inlining

Rick Copeland 2009-04-20 20:01:59

Answer 3

+1 A:

Did you compare straces of both? You might try to use the -tt parameter so you get the timing of the syscalls.

jpalecek 2009-04-20 18:52:03

My knowledge on strace is not much and I tried the -tt parameter and bunch of numbers appeared but I can't understand their meaning.

Nazgulled 2009-04-20 19:01:39

Try finding the part reading and writing (the output should have the format "time syscall(parameters) = return value", so find read() or write()) and post it

jpalecek 2009-04-20 19:04:05

Answer 4

+1 A:

Research mmap(2).

You will be throwing aways niceties of ftell/fread, but it will skip a layer of indirection if read throughput is really important.

Pasi Savolainen 2009-04-20 19:10:10

Thanks, I've needed that mmap...

Liran Orevi 2009-04-20 19:31:30

I'm not allowed to use anything else for this exercise.

Nazgulled 2009-04-20 19:58:43

Answer 5

+2 A:

Without comparing the source codes, it is difficult to say. If you are comparing your cat with GNU cat, remember that you are comparing a code that is a few hours/days old with a code that evolved for more than twenty years.

You may want to do a more comprehensive performance analysis, running the both programs with different input sizes, from different devices (a RAM disk would be good) and multiple times in a row. You must try to determine WHERE in your program it is slower.

Since cat itself is really trivial (and you said in a comment that you are already optimizing the compilation), I bet the performance impact you are observing is not in the actual algorithm, but on program load times. If the system binary is prelinked (which is common on most distros nowadays), you will see that it is loaded faster than any program you compile yourself (until you include your programs prelinking).

Juliano 2009-04-20 19:24:08

Answer 6

+10 A:

Ah, based on your edit you were being bitten by the readahead buffer. You cannot test two programs that read files side by side by running them once. The first always be slower since the file is on disk, once the file is in memory the second will run faster, you must either create new data for each or run one and then run both so they both get the benefit of the readahead buffer.

Chas. Owens 2009-04-20 19:38:33

ansaurus

tags:

views:

answers:

Why is my "cat" function with system calls slower compared to Linux's "cat"?

related questions