views:

885

answers:

6

I've done this function in C using system calls (open, read and write) to simulate the "cat" function in Linux systems and it's slower than the real one...

I'm using the same buffer size as the real "cat" and using "strace" I think it's making the same amount of system calls. But the output from my "cat" is a little bit slower than the real "cat".

This is the code I have:

#define BUFSIZ 32768

int sysWriteBuffer(int fdout, char *buffer, ssize_t readBytes) {
    ssize_t writtenBytes = 0;

    while(writtenBytes < readBytes) {
     writtenBytes += write(fdout,
      buffer + writtenBytes, readBytes - writtenBytes);
     if(writtenBytes == -1) {
      return -1;
     }
    }

    return 0;
}

int catPrint(int fdin, int fdout) {
    char buffer[BUFSIZ];
    ssize_t readBytes;

    do {
     readBytes = read(fdin, buffer, BUFSIZ);

     if(readBytes == -1) {
      return -1;
     }

     if(sysWriteBuffer(fdout, buffer, readBytes) == -1) {
      return -1;
     }
    } while(readBytes > 0);

    return 0;
}

I'm reading from a file (that I pass as argument to main, I think that code is not needed here) than I call the catPrint() function with that file descriptor and 1 for the output descriptor so it prints to stdout.

I don't understand why it's slower because I'm using the same file for testing and with both (the real "cat" and mine) there's only one read() and one write() for the whole text. Shouldn't the whole text just appear on screen?

P.S: I've tagged this as homework although my question here (why it's slower) is not part of the homework. I only needed to use the system calls to create a "cat" type function, which is done. I'm just intrigued by my code that's a bit a slower.

PROBLEM SOLVED WITH STUPIDITY FROM ME:
I just decided to call linux's original cat a few times on the same file, one after the other, and I just realized that it was also slow some of the times I called it, just as slow as my own. I guess everything's fine than...

Sorry for wasting your time like this people.

+1  A: 

How much? The canonical cat is something like

char bufr[BUFSIZ];
ssize_t len;

while((len=read(fdin, bufr, BUFSIZ)) >0)
     write(fdout, bufr, len);

which saves a few instructions.

Charlie Martin
This may be the canonical version, but an incorrect one (eg. if signal comes while you write())
jpalecek
What part of "something like" did you miss?
Charlie Martin
Like I said, original cat and my cat, both call one read() with a buffer size of 32768 and a write() with the same buffer size and a last read() at the end (when it doesn't read anything else and terminates).
Nazgulled
Right, but your version has multiple tests == more instructions. However, it sounds like the real issue was contention for CPU cycles.
Charlie Martin
+3  A: 

Perhaps you compiled without optimization (or without as high an optimization setting)?

Also, your code will call sysWriteBuffer once with readBytes equal to zero -- maybe that (partially) explains it?

You might also inline sysWriteBuffer (either via a compiler switch or by hand).

"inlining" means to copy the body of a function to its call site in order to remove the overhead of calling a function. Sometimes compilers do this automatically (I think -O3 enables this optimization in gcc). You can also use the the inline keyword in gcc to tell the compiler to inline a function. If you do this, your declaration will look like this:

static inline int sysWriteBuffer(int fdout, char *buffer, ssize_t readBytes) {
....
Rick Copeland
If you use strace on cat you'll see that it also happens there, so I just left it... And I'm using the -O2 flag.
Nazgulled
You could try "-O3 -funroll_loops" and see how that does. Better yet would be to determine the exact flags with which cat was compiled.
Rick Copeland
Just a note, the flap is -funroll-loops (second hyphen not an underscore), and I don't think it'll do a whole lot in this case anyway.
Tony k
Thanks for the correction on the flag -- btw, it's "flag", not "flap" (don't you wish we could edit these comments?) :)
Rick Copeland
What is "inlining" (does this word even exist?) sysWriteBuffer?
Nazgulled
I have updated the answer to define inlining
Rick Copeland
+1  A: 

Did you compare straces of both? You might try to use the -tt parameter so you get the timing of the syscalls.

jpalecek
My knowledge on strace is not much and I tried the -tt parameter and bunch of numbers appeared but I can't understand their meaning.
Nazgulled
Try finding the part reading and writing (the output should have the format "time syscall(parameters) = return value", so find read() or write()) and post it
jpalecek
+1  A: 

Research mmap(2).

You will be throwing aways niceties of ftell/fread, but it will skip a layer of indirection if read throughput is really important.

Pasi Savolainen
Thanks, I've needed that mmap...
Liran Orevi
I'm not allowed to use anything else for this exercise.
Nazgulled
+2  A: 

Without comparing the source codes, it is difficult to say. If you are comparing your cat with GNU cat, remember that you are comparing a code that is a few hours/days old with a code that evolved for more than twenty years.

You may want to do a more comprehensive performance analysis, running the both programs with different input sizes, from different devices (a RAM disk would be good) and multiple times in a row. You must try to determine WHERE in your program it is slower.

Since cat itself is really trivial (and you said in a comment that you are already optimizing the compilation), I bet the performance impact you are observing is not in the actual algorithm, but on program load times. If the system binary is prelinked (which is common on most distros nowadays), you will see that it is loaded faster than any program you compile yourself (until you include your programs prelinking).

Juliano
+10  A: 

Ah, based on your edit you were being bitten by the readahead buffer. You cannot test two programs that read files side by side by running them once. The first always be slower since the file is on disk, once the file is in memory the second will run faster, you must either create new data for each or run one and then run both so they both get the benefit of the readahead buffer.

Chas. Owens