tags:

views:

456

answers:

4

Basically what I'm after is the equivalent of the Standard Template Library queue implemented in such a way as to use the disk for storage. The volume of data that will need to be in the queue is far greater than can be stored in the ram of most computers today.

Ideally, I'm after a library to use. However, any recommendation on how to implement this queue would be useful.

+7  A: 

You might want to look at the STXXL:

"The core of STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i.e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks."

James McNellis
+1 Nice little library. I didn't know this existed!
dirkgently
It's a nice library. It also supports striping data across multiple disks.
Charles Salvia
Thanks. That looks like exactly what I need.
Adam
+1  A: 

A wild idea: Implement an allocator class that reads/writes to and from a file on disk and pass it to STL deque or queue or whatever suits your needs.

dirkgently
Cool idea, except that the typical STL container implementation allocator is only invoked to allocate/free memory. During ordinary iteration, for example, the allocator is not involved at all. And the container itself will not generate "page faults" to trigger page loads.
AndreyT
+2  A: 

You might want to look into the STLXX library. It contains a disk-based priority queue using the "Sequence Heap" model described by Peter Sanders.

Charles Salvia
A: 

Tell us about the data. Is each item large or small? Is it a fixed size or highly variable? The problem with disk storage is that as the items become more varied in size the more the problem begins to resemble a database problem. In which case you should probably look into something like a sqllite database as the backing store to your queue. Then you can just use SQL to pull the first record out.

If the data is really large you could just store each object on the filesystem using ever incrementing filename. Then you don't even need to store the queue in memory. The file date becomes your FIFO order. Just grab the oldest file in the directory to pull the first item off the "stack".

Finally if the data is small and numerous, you might consider overriding the Allocator of a std::list or std::deque. It might be possible to hide the file IO in the Allocator class. I don't have simple solution for the small and numerous data instance.

jmucchiello