ansaurus

Question

When to build your own buffer system for I/O (C++)?

Answer 1

+5 A:

Maybe you should look into memory mapped files.

Check them in this library : Boost.Interprocess

David Pierre 2008-11-06 09:58:35

MMFs would have been my suggestion as well. +1 for mentioning Boost's support for it.

OregonGhost 2008-11-06 10:03:08

Answer 2

+6 A:

I would also suggest memory-mapped files but if you're going to use boost I think boost::iostreams::mapped_file is a better match than boost::interprocess.

Andreas Magnusson 2008-11-06 10:09:53

I wasn't aware of that one.

David Pierre 2008-11-06 12:45:03

Answer 3

A:

Thank you for the piece of advise but, will it be faster than using buffered chunks of X MB? as:

unsigned int _buffer_size = 64 * 1024 * 1024; // 64 MB for instance.
char* _data_buffer = new char[_buffer_size];  
_file->read(_data_buffer, _buffer_size);
// Read directly from the memory using _data_buffer

I have tried this approach (even including the complete file) and it is not faster than using the STL ifstream line by line :(

Jacob 2008-11-06 10:42:44

It's not easy to say, but using a mmapped file allows the OS to manage it in the most efficient way. Buffering the whole file will likely degrade performance as it probably would cause a lot of swapping.

Andreas Magnusson 2008-11-06 11:42:53

Is this the only way you can organize the data? Maybe there's another way to deal with the whole problem? Using a DB? Split the file into a hierarchy of files?

Andreas Magnusson 2008-11-06 11:44:30

Don't allocate the buffer like that. Use std::vector. See my answer below.

Martin York 2008-11-06 12:43:47

Answer 4

+5 A:

A 2GB file is pretty big, and you need to be aware of all the possible areas that can act as bottlenecks:

The HDD itself
The HDD interface (IDE/SATA/RAID/USB?)
Operating system/filesystem
C/C++ Library
Your code

I'd start by doing some measurements:

How long does your code take to read/write a 2GB file,
How long does the 'cp' command take to copy it
How long does it take to write/read using just big fwrite()/fread() calls

Assuming your disk is capable of reading/writing at about 40Mb/s (which is probably a realistic figure to start from), your 2GB file can't run faster than about 50 seconds.

How long is it actually taking?

Hi Roddy, using fstream read method with 1.1 GB files and large buffers(128,255 or 512 MB) it takes about 43-48 seconds and it is the same using fstream getline (line by line). cp takes almost 2 minutes to copy the file.

In which case, your're hardware-bound. cp has to read and write, and will be seeking back and forth across the disk surface like mad when it does it. So it will (as you see) be more than twice as bad as the simple 'read' case.

To improve the speed, the first thing I'd try is a faster hard drive. Maybe a WD Velociraptor?

You haven't said what the disk interface is? SATA is pretty much the easiest/fastest option. Also (obvious point, this...) make sure the disk is physically on the same machine your code is running, otherwise you're network-bound...

Roddy 2008-11-06 11:19:29

If you're hitting hardware limitations, moving to a marginally faster drive won't help as much as moving to striped drives. Also, why use _cp_ for this--instead use dd if=/dev/zero of=/path and just test the write throughput. Experiment with blocksizes (bs=4K bs=32K) to see how that affects speed.

Mitch Haile 2008-11-08 13:23:34

Answer 5

A:

If you are going to buffer the file yourself, then I'd advise some testing using unbuffered I/O (setvbuf on a file that you've fopened can turn off the library buffering).

Basically, if you are going to buffer yourself, you want to disable the library's buffering, as it's only going to cause you pain. I don't know if there is any way to do that for STL I/O, so I recommend going down to the C-level I/O.

Michael Kohne 2008-11-06 11:30:11

Answer 6

A:

Hi Roddy, using fstream read method with 1.1 GB files and large buffers(128,255 or 512 MB) it takes about 43-48 seconds and it is the same using fstream getline (line by line). cp takes almost 2 minutes to copy the file.

Michael, regarding the setvbuf, I obtain the same results.

I think you are right Roddy and I can not improve the performance due to the hardware limitation.

Jacob 2008-11-06 12:03:43

It's better to either add this to your original post, or as a comment on my reply.

Roddy 2008-11-06 12:05:21

Answer 7

+2 A:

Just a thought, but avoid using std::endl as this will force a flush before the buffer is full. Use '\n' instead for a newline.

Evan Teran 2008-11-06 12:07:30

Yes, you are right. Good point :)

Jacob 2008-11-06 13:22:09

Answer 8

+2 A:

Don't use new to allocate the buffer like that:

Try: std::vector<>

unsigned int      buffer_size = 64 * 1024 * 1024; // 64 MB for instance.
std::vector<char> data_buffer(buffer_size);
_file->read(&data_buffer[0], buffer_size);

Also read the article on using underscore in identifier names:. Note your code is OK but.

Martin York 2008-11-06 12:42:56

I used new and char* just to make it as fast as possible.This code was in a class method, in my personal style I use the underscore to identify class members variables while the local variables of the method have no prefixes.

Jacob 2008-11-06 13:21:21

Answer 9

+1 A:

Using getline() may be inefficient because the string buffer may need to be re-sized several times as data is appended to it from the stream buffer. You can make this more efficient by pre-sizing the string:

Also you can set the size of the iostreams buffer to either very large or NULL(for unbuffered)

// Unbuffered Accesses:
fstream file;
file.rdbuf()->pubsetbuf(NULL,0);
file.open("PLOP");

// Larger Buffer
std::vector<char>  buffer(64 * 1024 * 1024);
fstream            file;
file.rdbuf()->pubsetbuf(&buffer[0],buffer.size());
file.open("PLOP");

std::string   line;
line.reserve(64 * 1024 * 1024);

while(getline(file,line))
{
    // Do Stuff.
}

Martin York 2008-11-06 13:04:58

The class uses a char* buffer that is associated with the streambuf of a istringstream. I load the raw data from the file directly to the buffer and use the stringstream to format it later but that did not improve the performance. Just in case I tried with ifstream and pubsetbuf but is slower. Why?

Jacob 2008-11-06 14:46:13

Answer 10

A:

Linux: How to Use RAM as Swap

HTH

plan9assembler 2008-11-08 02:40:58

ansaurus

tags:

views:

answers:

When to build your own buffer system for I/O (C++)?

related questions