views:

137

answers:

2

I need to save very large amounts of data (>500GB) which is being streamed (800Mb/s) from another device connected to my PC. The speed rules out use of a database e.g. MySQl/ISAM and I am looking for a fast, light library which sits on top of the 'C' stdio file lib (i.e. fopen/fclose/fwrite) which will allow me to write/read a very large file (up to available disk-space).

Behind-the-scenes, the large file can be broken up into smaller files e.g. 1GB and I want the API to take care of these details.

The data arrives at the PC in a compressed binary format and no further processing is needed before writing it to the hard-disk.

The library should be work for Windows and Linux.

+1  A: 

if you need random access into the data, take a look at memory mapped files.

It lets you map a file (or a section of a file) into memeory transparently, without having to explicitly allocate memeory and read data. It works on windows/Linux (there is a boost lib that wraps the differences).

On Windows you can handle files >>4gb on a 32bit os by using multiple windows into the file.

edit: Sorry 800Mb/s !! I don't know any disks that can cope with that. You migth be lookign at a raid array of SSD drives.
There used to be image capture cards that used an attached drive as a simple series of bytes with no filesystem to get very high speed sustained writes. I don't know if you are going to need somethign like that.

Martin Beckett
800Mb/s is just roughly 80MB/s which is very doable with 10000 RPM drives. WD Raptors can do 150MB/s (roughly 1500Mb/s) writes and just over 80MB/s read+writes.
slebetman
random access is not needed. its just sequential writes of received (streamed) data to the end-of-file. Once the file is created the post-processing is done using sequential reads.
Hugh O'Keeffe
A: 

For ultimate speed, I suggest you go highly platform specific.

The objective is to get as close as you can to connecting the input device directly to hard drive. One method is to write a driver for the input device that writes directly to the hard drive.

The generic algorithm is to use either a very large circular byte buffer or use multiple buffers. You need extra space to compensate for the speed difference between the input device and the output device; provided the input device is non-stop.

If you can pause the input device, the issue becomes easier.

Thomas Matthews