Reading from a file not line-by-line

views:

331

answers:

+2 Q:

Reading from a file not line-by-line

Assigning a QTextStream to a QFile and reading it line-by-line is easy and works fine, but I wonder if the performance can be inreased by first storing the file in memory and then processing it line-by-line.

Using FileMon from sysinternals, I've encountered that the file is read in chunks of 16KB and since the files I've to process are not that big (~2MB, but many!), loading them into memory would be a nice thing to try.

Any ideas how can I do so? QFile is inhereted from QIODevice, which allows me to ReadAll() it into QByteArray, but how to proceed then and divide it into lines?

+1 A:

As long as you don't open and close the file every time you read a single line, there should be no performance difference between reading in the entire file first or processing it as you read it (unless the processing part is faster when you have the entire file to work with). If you think about it, both ways are actually doing the same thing (reading the entire file once).

Sean Nyman 2009-06-26 13:05:48

+4 A:

QTextStream has a ReadAll function:

http://doc.trolltech.com/4.5/qtextstream.html#readAll

Surely that's all you need?

Or you could read all into the QByteArray and QTextStream can take that as an input instead of a QFile.

Phil Hannent 2009-06-26 13:11:01

Thanks! I didn't realize QTextStream can take QByteArray. That totally solves it.

MadH 2009-06-26 13:18:28

+2 A:

Be careful. There are many effects to consider.

For the string processing involved (or whatever you are doing with the file) there is likely no performance difference between doing it from memory and doing it from a file line by line provided that the file buffering is reasonable.

Actually calling your operating system to do a low level read is VERY expensive. That's why we have buffered I/O. For small I/O sizes the overhead of the call dominates. So, reading 64 bytes at a time is likely 1/4 as efficient as reading 256 bytes at a time. (And I am talking about read() here, not fgets() or fread() both of which are buffered.)

At a certain point the time required for the physical I/O starts to dominate, and when the performance doesn't increase much for a larger buffer you have found your buffer size. Very old data point: 7MHz Amiga 500, 100MB SCSI hard disk (A590+Quantum): my I/O performance really only hit maximum with a 256KB buffer size. Compared to the processor, that disk was FAST!!! (The computer had only 3MB of RAM. 256KB is a BIG buffer!)

However, you can have too much of a good thing. Once your file is in memory, the OS can page that file back out to disk at its leisure. And if it does so, you've lost any benefit of buffering. If you make your buffers too big then this may happen under certain load situations and your performance goes down the toilet. So consider your runtime environment carefully, and limit memory footprint if need be.

An alternative is to use mmap() to map the file into memory. Now the OS won't page your file out - rather, it will simply not page in, or if it needs memory it will discard any pieces of file cached in core. But it won't need to write anything to swap space - it has the file available. I'm not sure if this would result in better performance, however, because it's still better to do I/O in big chunks, and virtual memory tends to move things in page-sized chunks. Some memory managers may do a decent job of moving pages in chunks to increase I/O bandwidth, and prefetching pages. But I haven't really studied this in detail.

Get your program working correctly first. Then optimize.

d3jones 2009-06-26 19:46:19

thanks for a comment. Surely I've made it work already, that's why thinking of experimenting with optimisations. mmap() is not portable.

MadH 2009-06-26 22:15:55

ansaurus

tags:

views:

answers:

Reading from a file not line-by-line

related questions