views:

265

answers:

3

I am currently working on a project involving external merge-sort using replacement-selection and k-way merge. I have implemented the project in C++[runs on linux]. Its very simple and right now deals with only fixed sized records.

For reading & writing I use (i/o)fstream classes. After executing the program for few iterations, I noticed that

  • I/O read blocks for requests of size more than 4K(typical block size). Infact giving buffer sizes greater than 4K causes performance to decrease.
  • The output operations does not seem to need buffering, linux seemed to take care of buffering output. So I issue a write(record) instead of maintaining special buffer of writes and then flushing them out at once using write(records[]).

But the performance of the application does not seem to be great. How could I improve the performance? Should I maintain special I/O threads to take care of reading blocks or are there existing C++ classes providing this abstraction already?(Something like BufferedInputStream in java)

+1  A: 

Streams are known for their performance issues compared to plain-C I/O. In fact, they act as 'easy to use and suitable for different situations', but lack in performance. What I would do in your situation is switching to C-style I/O, profiling and then acting based on the profiling results.

Kotti
+1  A: 

Look at using The C Low Level IO library. http://www.linuxtopia.org/online_books/programming_books/gnu_libc_guide/Low_002dLevel-I_002fO.html or ftp://ftp2.developpez.be/developps/linux/alp/alp-apB-low-level-io.pdf

In windows a long time ago I got some IO to run 10 times faster using the Low Level IO open than using fopen.

Maybe you will not get the same performance benefit, I know it will be something.

Romain Hippeau
+2  A: 

Such high performance I/O is easiest done with mmap. This gives the kernel far more freedom to perform I/O and schedule CPU time for your app. For instance, when you read in 1 MB using ifstream, the kernel can only return when all the data is read. But with mmap(), the data can be returned incrementally as it becomes available.

However, you should understand how that happens. Just because the data appears to be in RAM doesn't mean that you should treat it as random accessibly. Don't feed it to std::sort. This will touch random parts of the mmap'ed area, causing page faults left right and center. As a result, you'll be causing heavy disk seeking to resolve the random page faults. Instead, mmap() two inputs and merge them. Because the mmap command told the kernel what data you need in the future, the kernel will feed you data as fast as it can, and your merge sort will page fault (i.e. stall) when it is temporarily out of data.

MSalters
Thank you. I will try mmap and get back with the results soon!
Ajay