on-disk

On Disk Substring index

I have a file (fasta file to be specific) that I would like to index, so that I can quickly locate any substring within the file and then find the location within the original fasta file. This would be easy to do in many cases, using a Trie or substring array, unfortunately the strings I need to index are 800+ MBs which means that doing...

Disk-backed STL container classes?

I enjoy developing algorithms using the STL, however, I have this recurring problem where my data sets are too large for the heap. I have been searching for drop-in replacements for STL containers and algorithms which are disk-backed, i.e. the data structures on stored on disk rather than the heap. A friend recently pointed me toward...

Scalable stl set like container for C++

Hi, I need to store large number of integers. There can be duplicates in the input stream of integers, I just need to store distinct amongst them. I was using stl set initially but It went OutOfMem when input number of integers went too high. I am looking for some C++ container library which would allow me to store numbers with the said...

B+Tree on-disk implementation in Java

Does anyone know where to find a B+Tree on-disk implementation? I went through google forward and backward and unfortunately I couldn't find anything sensible. Other threads have suggested to maybe take the tree from sqlite, sqljet or bdb but these trees are nested in the whole database and you can't really "just" filter out the B+Tree. ...