views:

123

answers:

3

Are there any good resources or books for spillable data structures, ie say a queue, when storing large objects it could fill up all of memory, but if you can keep say the most used items of that queue structure in memory and the rest on disk (sort of like paging), similarly this question applies to other structures such as linked lists, arrays, hashtables and so on.

+3  A: 

There is the Buffer Tree (PDF, 0.6 MB):

"... developed an efficient external priority queue and batched dynamic versions of the (one-dimensional) range tree and the segment tree."

and

"... allow us to design efficient external-memory algorithms from known internal algorithms in a straightforward way, such that all the I/O specific parts of the algorithms are hidden in the data structures."

It is the mentioned as part of a broader treatment of the subject in the freely available online book "Algorithms and Data Structures for External Memory" by Jeffrey Scott Vitter (PDF, 1 MB).

Peter Mortensen
+1 for the reference to the nice book, I haven't known before
dmeister
A: 

Edit: Do you mean finding efficient disk-based analogues to the fundamental RAM-based data structures (e.g. linked lists, stacks, queues, priority queues, etc.) ? If so, the answer below may or may not be helpful.

I'm not entirely sure what you're trying to do. By queue, do you mean a FIFO (first in first out) queue or a priority queue?

For dealing with FIFO queues and logging, perhaps you could look into ring buffers and log rotation.

For dealing with caching data in RAM to minimize disk access, you may or may not be better off leaving that to the operating system. Unless you're developing a Windows application, you might be better off simply reading and writing to and from files the naive way, since the operating system should perform read and write caching well enough. However, as far as I can tell, Windows has horrible read/write caching (I may be wrong).

Perhaps looking at the VFS subsystem in Linux and studying http://lxr.linux.no/#linux+v2.6.31/Documentation/filesystems/vfs.txt will help, since (I think) this is the part of Linux that handles caching.

I'm not an expert in queues and caching, but I do know a few things about it. If you could provide more detail on what you are trying to do, maybe someone can help you hone in on the right solution.

Joey Adams
A: 

What you are looking for might be the topic of I/O efficient algorithms. A Google search didn't turn up any books for me, but this course page contains a list of articles which may or may not be relevant for you.

You should also take a look at the WikiPedia page for B-trees, especially the section on B-trees in filesystems.

Jørgen Fogh