Right now I am using fread() to read a file, but in other language fread() is inefficient i'v been told. Is this the same in C? If so, how would faster file reading be done?
I'm thinking of the read system call.
Keep in mind that fread is a wrapper for 'read'.
On the other hand fread has an internal buffer, so 'read' may be faster but i think 'fread' will be more efficient.
What's slowing you down?
If you need the fastest possible file reading (while still playing nicely with the operating system), go straight to your OS's calls, and make sure you study how to use them most effectively.
- How is your data physically laid out? For example, rotating drives might read data stored at the edges faster, and you want to minimize or eliminate seek times.
- Is your data pre-processed? Do you need to do stuff between loading it from disk and using it?
- What is the optimum chunk size for reading? (It might be some even multiple of the sector size. Check your OS documentation.)
If seek times are a problem, re-arrange your data on disk (if you can) and store it in larger, pre-processed files instead of loading small chunks from here and there.
If data transfer times are a problem, perhaps consider compressing the data.
If you are willing to go beyond the C spec into OS specific code, memory mapping is generally considered the most efficient way.
For Posix, check out mmap
and for Windows check out OpenFileMapping
If fread
is slow it is because of the additional layers it adds to the underlying operating system mechanism to read from a file that interfere with how your particular program is using fread
. In other words, it's slow because you aren't using it the way it has been optimized for.
Having said that, faster file reading would be done by understanding how the operating system I/O functions work and providing your own abstraction that handles your program's particular I/O access patterns better. Most of the time you can do this with memory mapping the file.
However, if you are hitting the limits of the machine you are running on, memory mapping probably won't be sufficient. At that point it's really up to you to figure out how to optimize your I/O code.
It really shouldn't matter.
If you're reading from an actual hard disk, it's going to be slow. The hard disk is your bottle neck, and that's it.
Now, if you're being silly about your call to read/fread/whatever, and say, fread()-ing a byte at a time, then yes, it's going to be slow, as the overhead of fread() will outstrip the overhead of reading from the disk.
If you call read/fread/whatever and request a decent portion of data. This will depend on what you're doing: sometimes all want/need is 4 bytes (to get a uint32), but sometimes you can read in large chunks (4 KiB, 64 KiB, etc. RAM is cheap, go for something significant.)
If you're doing small reads, some of the higher level calls like fread() will actual help you by buffering data behind your back. If you're doing large reads, it might not be helpful, but switching from fread to read will probably not yield that much improvement, as you're bottlenecked on disk speed.
In short: if you can, request a liberal amount when reading, and try to minimize what you write. For large amounts, powers of 2 tend to be friendlier than anything else, but of course, it's OS, hardware, and weather dependent.
So, let's see if this might bring out any differences:
#include <sys/time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#define BUFFER_SIZE (1 * 1024 * 1024)
#define ITERATIONS (10 * 1024)
double now()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec + tv.tv_usec / 1000000.;
}
int main()
{
unsigned char buffer[BUFFER_SIZE]; // 1 MiB buffer
double end_time;
double total_time;
int i, x, y;
double start_time = now();
#ifdef USE_FREAD
FILE *fp;
fp = fopen("/dev/zero", "rb");
for(i = 0; i < ITERATIONS; ++i)
{
fread(buffer, BUFFER_SIZE, 1, fp);
for(x = 0; x < BUFFER_SIZE; x += 1024)
{
y += buffer[x];
}
}
fclose(fp);
#elif USE_MMAP
unsigned char *mmdata;
int fd = open("/dev/zero", O_RDONLY);
for(i = 0; i < ITERATIONS; ++i)
{
mmdata = mmap(NULL, BUFFER_SIZE, PROT_READ, MAP_PRIVATE, fd, i * BUFFER_SIZE);
// But if we don't touch it, it won't be read...
// I happen to know I have 4 KiB pages, YMMV
for(x = 0; x < BUFFER_SIZE; x += 1024)
{
y += mmdata[x];
}
munmap(mmdata, BUFFER_SIZE);
}
close(fd);
#else
int fd;
fd = open("/dev/zero", O_RDONLY);
for(i = 0; i < ITERATIONS; ++i)
{
read(fd, buffer, BUFFER_SIZE);
for(x = 0; x < BUFFER_SIZE; x += 1024)
{
y += buffer[x];
}
}
close(fd);
#endif
end_time = now();
total_time = end_time - start_time;
printf("It took %f seconds to read 10 GiB. That's %f MiB/s.\n", total_time, ITERATIONS / total_time);
return 0;
}
...yields:
$ gcc -o reading reading.c
$ ./reading ; ./reading ; ./reading
It took 1.141995 seconds to read 10 GiB. That's 8966.764671 MiB/s.
It took 1.131412 seconds to read 10 GiB. That's 9050.637376 MiB/s.
It took 1.132440 seconds to read 10 GiB. That's 9042.420953 MiB/s.
$ gcc -o reading reading.c -DUSE_FREAD
$ ./reading ; ./reading ; ./reading
It took 1.134837 seconds to read 10 GiB. That's 9023.322991 MiB/s.
It took 1.128971 seconds to read 10 GiB. That's 9070.207522 MiB/s.
It took 1.136845 seconds to read 10 GiB. That's 9007.383586 MiB/s.
$ gcc -o reading reading.c -DUSE_MMAP
$ ./reading ; ./reading ; ./reading
It took 2.037207 seconds to read 10 GiB. That's 5026.489386 MiB/s.
It took 2.037060 seconds to read 10 GiB. That's 5026.852369 MiB/s.
It took 2.031698 seconds to read 10 GiB. That's 5040.119180 MiB/s.
...or no noticeable difference. (fread is winning sometimes, sometimes read)
Note: The slow mmap
is surprising. This might be due to me asking it to allocate the buffer for me. (I wasn't sure about requirements of supplying a pointer...)
In really short: Don't prematurely optimize. Make it run, make it right, make it fast, that order.
Back by popular demand, I ran the test on a real file. (The first 675 MiB of the Ubuntu 10.04 32-bit desktop installation CD ISO) These were the results:
# Using fread()
It took 31.363983 seconds to read 675 MiB. That's 21.521501 MiB/s.
It took 31.486195 seconds to read 675 MiB. That's 21.437967 MiB/s.
It took 31.509051 seconds to read 675 MiB. That's 21.422416 MiB/s.
It took 31.853389 seconds to read 675 MiB. That's 21.190838 MiB/s.
# Using read()
It took 33.052984 seconds to read 675 MiB. That's 20.421757 MiB/s.
It took 31.319416 seconds to read 675 MiB. That's 21.552126 MiB/s.
It took 39.453453 seconds to read 675 MiB. That's 17.108769 MiB/s.
It took 32.619912 seconds to read 675 MiB. That's 20.692882 MiB/s.
# Using mmap()
It took 31.897643 seconds to read 675 MiB. That's 21.161438 MiB/s.
It took 36.753138 seconds to read 675 MiB. That's 18.365779 MiB/s.
It took 36.175385 seconds to read 675 MiB. That's 18.659097 MiB/s.
It took 31.841998 seconds to read 675 MiB. That's 21.198419 MiB/s.
...and one very bored programmer later, we've read the CD ISO off disk. 12 times. Before each test, the disk cache was cleared, and during each test there was enough, and approximately the same amout of, RAM free to hold the CD ISO twice in RAM.
One note of interest: I was originally using a large malloc() to fill memory and thus minimize the effects of disk caching. It may be worth noting that mmap
performed terribly here. The other two solutions merely ran, mmap
ran and, for reasons I can't explain, began pushing memory to swap, which killed its performance. (The program was not leaking, as far as I know (the source code is above) - the actual "used memory" stayed constant throughout the trials.)
read() posted the fastest time overall, fread() posted really consistent times. This may have been to some small hiccup during the testing, however. All told, the three methods were just about equal. (Especially fread
and read
...)
The problem that some people have noted here, is that depending on your source, your target buffer size, etc, you can create a custom handler for that specific case, but there are other cases, like block/character devices, i.e. /dev/* where standard rules like that do or don't apply and your backing source might be something that pops character off serially without any buffering, like an I2C bus, standard RS-232, etc. And there are some other sources where character devices are memory mappable large sections of memory like nvidia does with their video driver character device (/dev/nvidiactl).
One other design implementation that many people have chosen in high-performance applications is asynchronous instead of synchronous I/O for handling how data is read. Look into libaio, and the ported versions of libaio which provide prepackaged solutions for asynchronous I/O, as well as look into using read with shared memory between a worker and consumer thread (but keep in mind that this will increase programming complexity if you go this route). Asynchronous I/O is also something that you can't get out of the box with stdio that you can get with standard OS system calls. Just be careful as there are bits of read which are `portable' according to the spec, but not all operating systems (like FreeBSD for instance) support POSIX STREAMs (by choice).
Another thing that you can do (depending on how portable your data is) is look into compression and/or conversion into a binary format like database formats, i.e. BDB, SQL, etc. Some database formats are portable across machines using endianness conversion functions.
In general it would be best to take a set of algorithms and methods, run performance tests using the different methods, and evaluate the best algorithm that serves the mean task that your application would serve. That would help you determine what the best performing algorithm is.
Maybe check out how perl does it. Perl's I/O routines are optimized, and are, I gather, the reason why processing text with a perl filter can be twice as fast as doing the same transformation with sed
.
Obviously perl is pretty complex, and I/O is only one small part of what it does. I've never looked at its source so I couldn't give you any better directions than to point you here.