ansaurus

Question

Fastest file reading in C.

Answer 1

+2 A:

I'm thinking of the read system call.

Keep in mind that fread is a wrapper for 'read'.

On the other hand fread has an internal buffer, so 'read' may be faster but i think 'fread' will be more efficient.

mcabral 2010-06-08 23:41:16

Do note that this is POSIX specific.

Billy ONeal 2010-06-09 01:24:30

It will only be efficient in some circumstances, i.e. if your access pattern is a good match for its buffering behaviour.

Matt Curtis 2010-06-10 05:16:47

Answer 2

+3 A:

What's slowing you down?

If you need the fastest possible file reading (while still playing nicely with the operating system), go straight to your OS's calls, and make sure you study how to use them most effectively.

How is your data physically laid out? For example, rotating drives might read data stored at the edges faster, and you want to minimize or eliminate seek times.
Is your data pre-processed? Do you need to do stuff between loading it from disk and using it?
What is the optimum chunk size for reading? (It might be some even multiple of the sector size. Check your OS documentation.)

If seek times are a problem, re-arrange your data on disk (if you can) and store it in larger, pre-processed files instead of loading small chunks from here and there.

If data transfer times are a problem, perhaps consider compressing the data.

Matt Curtis 2010-06-08 23:48:31

To whoever is downvoting this: I'd be interested to know why. Is my answer wrong or misleading?

Matt Curtis 2010-06-11 12:38:48

Answer 3

+7 A:

If you are willing to go beyond the C spec into OS specific code, memory mapping is generally considered the most efficient way.

For Posix, check out mmap and for Windows check out OpenFileMapping

R Samuel Klatchko 2010-06-09 00:11:38

Actually, you want CreateFileMapping and MapViewOfFile. OpenFileMapping is only for opening an existing named memory mapped file object, for example for shared memory. But +1 for suggesting memory mapping period.

Billy ONeal 2010-06-09 01:24:10

Answer 4

A:

If fread is slow it is because of the additional layers it adds to the underlying operating system mechanism to read from a file that interfere with how your particular program is using fread. In other words, it's slow because you aren't using it the way it has been optimized for.

Having said that, faster file reading would be done by understanding how the operating system I/O functions work and providing your own abstraction that handles your program's particular I/O access patterns better. Most of the time you can do this with memory mapping the file.

However, if you are hitting the limits of the machine you are running on, memory mapping probably won't be sufficient. At that point it's really up to you to figure out how to optimize your I/O code.

MSN 2010-06-09 00:21:47

Answer 5

A:

It really shouldn't matter.

If you're reading from an actual hard disk, it's going to be slow. The hard disk is your bottle neck, and that's it.

Now, if you're being silly about your call to read/fread/whatever, and say, fread()-ing a byte at a time, then yes, it's going to be slow, as the overhead of fread() will outstrip the overhead of reading from the disk.

If you call read/fread/whatever and request a decent portion of data. This will depend on what you're doing: sometimes all want/need is 4 bytes (to get a uint32), but sometimes you can read in large chunks (4 KiB, 64 KiB, etc. RAM is cheap, go for something significant.)

If you're doing small reads, some of the higher level calls like fread() will actual help you by buffering data behind your back. If you're doing large reads, it might not be helpful, but switching from fread to read will probably not yield that much improvement, as you're bottlenecked on disk speed.

In short: if you can, request a liberal amount when reading, and try to minimize what you write. For large amounts, powers of 2 tend to be friendlier than anything else, but of course, it's OS, hardware, and weather dependent.

So, let's see if this might bring out any differences:

#include <sys/time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

#define BUFFER_SIZE (1 * 1024 * 1024)
#define ITERATIONS (10 * 1024)

double now()
{
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return tv.tv_sec + tv.tv_usec / 1000000.;
}

int main()
{
    unsigned char buffer[BUFFER_SIZE]; // 1 MiB buffer

    double end_time;
    double total_time;
    int i, x, y;
    double start_time = now();

#ifdef USE_FREAD
    FILE *fp;
    fp = fopen("/dev/zero", "rb");
    for(i = 0; i < ITERATIONS; ++i)
    {
        fread(buffer, BUFFER_SIZE, 1, fp);
        for(x = 0; x < BUFFER_SIZE; x += 1024)
        {
            y += buffer[x];
        }
    }
    fclose(fp);
#elif USE_MMAP
    unsigned char *mmdata;
    int fd = open("/dev/zero", O_RDONLY);
    for(i = 0; i < ITERATIONS; ++i)
    {
        mmdata = mmap(NULL, BUFFER_SIZE, PROT_READ, MAP_PRIVATE, fd, i * BUFFER_SIZE);
        // But if we don't touch it, it won't be read...
        // I happen to know I have 4 KiB pages, YMMV
        for(x = 0; x < BUFFER_SIZE; x += 1024)
        {
            y += mmdata[x];
        }
        munmap(mmdata, BUFFER_SIZE);
    }
    close(fd);
#else
    int fd;
    fd = open("/dev/zero", O_RDONLY);
    for(i = 0; i < ITERATIONS; ++i)
    {
        read(fd, buffer, BUFFER_SIZE);
        for(x = 0; x < BUFFER_SIZE; x += 1024)
        {
            y += buffer[x];
        }
    }
    close(fd);

#endif

    end_time = now();
    total_time = end_time - start_time;

    printf("It took %f seconds to read 10 GiB. That's %f MiB/s.\n", total_time, ITERATIONS / total_time);

    return 0;
}

...yields:

$ gcc -o reading reading.c
$ ./reading ; ./reading ; ./reading 
It took 1.141995 seconds to read 10 GiB. That's 8966.764671 MiB/s.
It took 1.131412 seconds to read 10 GiB. That's 9050.637376 MiB/s.
It took 1.132440 seconds to read 10 GiB. That's 9042.420953 MiB/s.
$ gcc -o reading reading.c -DUSE_FREAD
$ ./reading ; ./reading ; ./reading 
It took 1.134837 seconds to read 10 GiB. That's 9023.322991 MiB/s.
It took 1.128971 seconds to read 10 GiB. That's 9070.207522 MiB/s.
It took 1.136845 seconds to read 10 GiB. That's 9007.383586 MiB/s.
$ gcc -o reading reading.c -DUSE_MMAP
$ ./reading ; ./reading ; ./reading 
It took 2.037207 seconds to read 10 GiB. That's 5026.489386 MiB/s.
It took 2.037060 seconds to read 10 GiB. That's 5026.852369 MiB/s.
It took 2.031698 seconds to read 10 GiB. That's 5040.119180 MiB/s.

...or no noticeable difference. (fread is winning sometimes, sometimes read)

Note: The slow mmap is surprising. This might be due to me asking it to allocate the buffer for me. (I wasn't sure about requirements of supplying a pointer...)

In really short: Don't prematurely optimize. Make it run, make it right, make it fast, that order.

Back by popular demand, I ran the test on a real file. (The first 675 MiB of the Ubuntu 10.04 32-bit desktop installation CD ISO) These were the results:

# Using fread()
It took 31.363983 seconds to read 675 MiB. That's 21.521501 MiB/s.
It took 31.486195 seconds to read 675 MiB. That's 21.437967 MiB/s.
It took 31.509051 seconds to read 675 MiB. That's 21.422416 MiB/s.
It took 31.853389 seconds to read 675 MiB. That's 21.190838 MiB/s.
# Using read()
It took 33.052984 seconds to read 675 MiB. That's 20.421757 MiB/s.
It took 31.319416 seconds to read 675 MiB. That's 21.552126 MiB/s.
It took 39.453453 seconds to read 675 MiB. That's 17.108769 MiB/s.
It took 32.619912 seconds to read 675 MiB. That's 20.692882 MiB/s.
# Using mmap()
It took 31.897643 seconds to read 675 MiB. That's 21.161438 MiB/s.
It took 36.753138 seconds to read 675 MiB. That's 18.365779 MiB/s.
It took 36.175385 seconds to read 675 MiB. That's 18.659097 MiB/s.
It took 31.841998 seconds to read 675 MiB. That's 21.198419 MiB/s.

...and one very bored programmer later, we've read the CD ISO off disk. 12 times. Before each test, the disk cache was cleared, and during each test there was enough, and approximately the same amout of, RAM free to hold the CD ISO twice in RAM.

One note of interest: I was originally using a large malloc() to fill memory and thus minimize the effects of disk caching. It may be worth noting that mmap performed terribly here. The other two solutions merely ran, mmap ran and, for reasons I can't explain, began pushing memory to swap, which killed its performance. (The program was not leaking, as far as I know (the source code is above) - the actual "used memory" stayed constant throughout the trials.)

read() posted the fastest time overall, fread() posted really consistent times. This may have been to some small hiccup during the testing, however. All told, the three methods were just about equal. (Especially fread and read...)

Thanatos 2010-06-09 01:01:48

9533.226368 MiB/s ... somehow I think your code is wrong. That's faster than RAM often operates, much less hard drives.

Billy ONeal 2010-06-09 01:25:45

Actually, reading from /dev/zero doesn't actually read anything which is what causes those high numbers.

Billy ONeal 2010-06-09 01:26:18

@Billy O'Neal: This is a good observation, I will check.

Thanatos 2010-06-09 01:26:50

@Billy O'Neal: Right - but it still have to write to the memory. Granted, I would hope that buffer would end up in cache fairly quickly.

Thanatos 2010-06-09 01:27:39

Pretty sure it's behaving well - I almost wonder if I'm benefiting because I'm not trying to read the data out of the memory, and caches are just eating this alive. The assembly shows calls to read, so it's not being optimized out. Increasing the number of iterations causes the time to scale linearly.

Thanatos 2010-06-09 01:30:55

@Thanatos: Shouldn't be much difference between the read syscall and fread. There should be a difference between either of those and memory mapping though.

Billy ONeal 2010-06-09 01:32:58

Additionally, `strace` shows calls to `read`.

Thanatos 2010-06-09 01:33:32

Thanatos 2010-06-09 01:34:37

Added mmap. I also added a loop to touch 1 byte every kilobyte, and add it to a sum. ("Use" pieces of the read data - this is so that mmap will actually read it, otherwise, it won't swap the data in.) New speeds: `read`: 8924, `fread`: 8967, `mmap`: 5007...

Thanatos 2010-06-09 01:44:53

this is so that mmap will actually read it, otherwise, **it won't swap the data in** <-- This is why mmap is typically faster. Often you don't need all the bytes of the file. Memory mapping also helps more for writes because it allows the OS to defer the write until it is efficient to do the write itself.

Billy ONeal 2010-06-09 01:49:00

@Thantos: You can't benchmark with /dev/zero. Benchmark with a real file -- metafiles like /dev/zero aren't going to match real-world use of the IO system, because they don't read any data.

Billy ONeal 2010-06-09 02:10:47

@Billy what about /dev/random ?

mcabral 2010-06-09 02:26:05

+1. Clearly some people were voting down this answer. Why? Only this answer benchmarks the performance, which is far more solid than merely discussing. Reading /dev/zero is fine as we also want to know how much CPU overhead fread() causes. If fread() has little CPU overhead, fread() should be preferred over read() most of time.

2010-06-09 02:29:03

@mcabral: That will just stress the CPU, not the I/O system. @lh3: CPU overhead does not matter for the IO system. What matters much more is how well the system utilizes real world I/O devices, like hard drives. Solutions that are able to stripe multiple reads together will use more CPU time, but will perform better when you point them at real world I/O loads (i.e. hard drives). If @Thanatos uses a real file for his benchmarks I'd change my downvote to an upvote.

Billy ONeal 2010-06-09 02:38:05

It would be worthy to see these done with actual data. I chose /dev/zero for precisely the reason you note - to stress the I/O system, in an attempt to isolate the differences between read/fread. A more practical example would be useful, and may very well bring out some of the promising characteristics in mmap() that I've always heard such good things about.

Thanatos 2010-06-09 02:55:22

+1 for "What matters much more is how well the system utilizes real world I/O devices" <-- that's what I tried to convey, perhaps poorly, in the opening text of the answer.

Thanatos 2010-06-09 02:56:04

Answer 6

A:

The problem that some people have noted here, is that depending on your source, your target buffer size, etc, you can create a custom handler for that specific case, but there are other cases, like block/character devices, i.e. /dev/* where standard rules like that do or don't apply and your backing source might be something that pops character off serially without any buffering, like an I2C bus, standard RS-232, etc. And there are some other sources where character devices are memory mappable large sections of memory like nvidia does with their video driver character device (/dev/nvidiactl).

One other design implementation that many people have chosen in high-performance applications is asynchronous instead of synchronous I/O for handling how data is read. Look into libaio, and the ported versions of libaio which provide prepackaged solutions for asynchronous I/O, as well as look into using read with shared memory between a worker and consumer thread (but keep in mind that this will increase programming complexity if you go this route). Asynchronous I/O is also something that you can't get out of the box with stdio that you can get with standard OS system calls. Just be careful as there are bits of read which are `portable' according to the spec, but not all operating systems (like FreeBSD for instance) support POSIX STREAMs (by choice).

Another thing that you can do (depending on how portable your data is) is look into compression and/or conversion into a binary format like database formats, i.e. BDB, SQL, etc. Some database formats are portable across machines using endianness conversion functions.

In general it would be best to take a set of algorithms and methods, run performance tests using the different methods, and evaluate the best algorithm that serves the mean task that your application would serve. That would help you determine what the best performing algorithm is.

yaneurabeya 2010-06-09 02:14:28

Answer 7

A:

Maybe check out how perl does it. Perl's I/O routines are optimized, and are, I gather, the reason why processing text with a perl filter can be twice as fast as doing the same transformation with sed.

Obviously perl is pretty complex, and I/O is only one small part of what it does. I've never looked at its source so I couldn't give you any better directions than to point you here.

intuited 2010-06-09 02:33:12

ansaurus

tags:

views:

answers:

Fastest file reading in C.

related questions