tags:

views:

136

answers:

5

I'm stuck on this. Currently I'm using:

FILE *a = fopen("sample.txt", "r");
int n;
while ((n = fgetc(a)) != EOF) {
  putchar(n);
}

However this method seems to be a bit inefficient. Is there any better way? I tried using fgets:

char *s;
fgets(s, 600, a);
puts(s);

There's one thing I find wrong about this second method, which is that you would need a really large number for the second argument of fgets.

Thanks for all the suggestions. I found a way (someone on IRC told me this) using open(), read(), and write().

char *filename = "sample.txt";
char buf[8192];
int r = -1;
int in = open(filename, O_RDONLY), out = 0;
if (in == -1)
  return -1;
while (1) {
  r = read(in, buf, sizeof(buf));
  if (r == -1 || r == 0) { break; }
  r = write(out, buf, r);
  if (r == -1 || r == 0) { break; }
}
+9  A: 

The second code is broken. You need to allocate a buffer, e.g.:

char s[4096];
fgets(s, sizeof(s), a);

Of course, this doesn't solve your problem.

Read fix-size chunks from the input and write out whatever gets read in:

int n;
char s[65536];
while ((n = fread(s, 1, sizeof(s), a))) {
    fwrite(s, 1, n, stdout);
}

You might also want to check ferror(a) in case it stopped for some other reason than reaching EOF.

Notes

  1. I originally used a 4096 byte buffer because it is a fairly common page size for memory allocation and block size for the file system. However, the sweet-spot on my Linux system seems to be around the 64 kB mark, which surprised me. Perhaps CPU cache is a factor here, but I'm just guessing.
  2. For a cold cache, it makes almost no difference, since I/O paging will dominate; even one byte at a time runs at about the same speed.
Marcelo Cantos
Tiny mistake in there: it should read `fwrite(s, 1, n, stdout)`
Joey Adams
@Joey: I spotted it while debugging, but thanks for pointing it out.
Marcelo Cantos
Actually, this is a very good answer. Multiples of the page size and disk block size are very good for performance. You may want to test other multiples, I've had very good results with 65536 for disk to disk copies.
ninjalj
If you're copying large chunks, there's no need to use stdio's buffering, so using raw `read()` and `write()` calls (or your OS's equivalent) might be slightly faster
bdonlan
@bdonlan: Agreed, but it's less portable, and the difference is a split-hair.
Marcelo Cantos
@ninjalj: yes, I found this out while testing my answer. It's good to see someone else found the same sweet-spot.
Marcelo Cantos
Just a side remark, you probably should not allocate the buffer on the stack. Mind the name of this site: stackoverflow.
Jens Gustedt
4096 is probably a better choice if you expect to be dealing with small files.
Joey Adams
@Joey: What's the benefit?
Marcelo Cantos
A: 

It all depends on what you want to do with the data.

this will crash though:

char *s;
fgets(s, 600, a);
puts(s);

since s is no buffer, just a pointer somewhere.

One way is to read in the whole file into a buffer and work with that by using fread()

filebuffer = malloc(filelength);
fread( buffer, 1, filelength, fp );
Anders K.
+4  A: 

The most efficient method will depend greatly on the operating system. For example, in Linux, you can use sendfile:

struct stat buf;
int fd = open(filename, O_RDONLY);
fstat(fd, &buf);
sendfile(0, fd, NULL, buf.st_size);

This does the copy directly in the kernel, minimizing unnecessary memory-to-memory copies. Other platforms may have similar approaches, such as write()ing to stdout from a mmaped buffer.

bdonlan
sendfile() is only for sockets, AFAIK. Maybe you meant mmap(), vmsplice(), splice()?
ninjalj
`sendfile` can be used for other file descriptors, I believe. splice and friends only work with either to or from a fifo.
bdonlan
@bdonlan, about splice(), yes, they only work with pipes, that's why you first vmsplice() into a pipe, and then splice() from the pipe.
ninjalj
Or you can just sendfile :) splice is great when you need to move data from something that's not an ordinary file - that's the limitation of sendfile, it only works on mmappable sources.
bdonlan
About sendfile(), either the manpage in man-3.25 is outdated, or it only supports sendfile()ing to sockets.
ninjalj
Ah, indeed. So you'd need to use splice.
bdonlan
+1  A: 

I believe the FILE returned by fopen is tipically (always?) buffered, so you first example is not so inefficient as you may think.

The second might perform a little better... if you correct the errors: remember to allocate the buffer, and remember that puts add a newline!.

Other option is to use binary reads (fread).

leonbloy
On a cold system, the first sentence is correct. However, While testing with a hot cache, I found per-character to be roughly 200 times slower than using the optimal buffer size.
Marcelo Cantos
A: 

What you're doing is plenty good enough in 99% of applications. Granted, in most C libraries, stdio performs badly, and you'd be better off with Phong Vo's sfio library. If you have measurements showing this is a bottleneck, the natural next step is to allocate a buffer and use fread/fwrite. You don't want fgets because you don't care about newlines.


First make it run, then make it right. You probably don't have to make it fast.

Norman Ramsey