tags:

views:

119

answers:

5

Suppose you have a really large table, say a few billion unordered rows, and now you want to index it for fast lookups. Or maybe you are going to bulk load it and order it on the disk with a clustered index. Obviously, when you get to a quantity of data this size you have to stop assuming that you can do things like sorting in memory (well, not without going to virtual memory and taking a massive performance hit).

Can anyone give me some clues about how databases handle large quantities of data like this under the hood? I'm guessing there are algorithms that use some form of smart disk caching to handle all the data but I don't know where to start. References would be especially welcome. Maybe an advanced databases textbook?

A: 

Are you building a database engine?

Edit: I built a disc based database system back in the mid '90's.

Fixed size records are the easiest to work with because your file offset for locating a record can be easily calculated as a multiple of the record size. I also had some with variable record sizes.

My system needed to be optimized for reading. The data was actually stored on CD-ROM, so it was read-only. I created binary search tree files for each column I wanted to search on. I took an open source in-memory binary search tree implementation and converted it to do random access of a disc file. Sorted reads from each index file were easy and then reading each data record from the main data file according to the indexed order was also easy. I didn't need to do any in-memory sorting and the system was way faster than any of the available RDBMS systems that would run on a client machine at the time.

For fixed record size data, the index can just keep track of the record number. For variable length data records, the index just needs to store the offset within the file where the record starts and each record needs to begin with a structure that specifies it's length.

Dennis Palmer
Essentially I am doing just this.
Dana Robinson
And before anyone asks, using a commercial DBMS is not an option.
Dana Robinson
If you want to request clarification, please do so in comments instead of posting an answer.
Brandon
A: 

Hmm... Interesting question.

I think that most used database management systems using operating system mechanism for memory management, and when physical memory ends up, memory tables goes to swap.

Sergey Kuznetsov
Really? They just malloc() away? There must be some algorithms that try to handle this problem.
Dana Robinson
It is definitely far from being true. DBMSs implement a number of algorithms for relational operations which are specially designed for working with disk. They don't rely just on virtual memory.
Dmitry
+4  A: 

Multiway Merge Sort is a keyword for sorting huge amounts of memory

Dmitry
Database Systems: The Complete Book by Ullman et. al. must be a very good reading. http://infolab.stanford.edu/~ullman/dscb.html
Dmitry
Like in this link? http://www.cs.aau.dk/~simas/aalg04/esort.pdf
Dana Robinson
+1 we had to implement a very similar function where the datafiles we open can be anywhere from 10MB to 1TB and we have to potentially sort the entire file. Essentially you end up using plenty of disk space, as much as the original file even, and sort subsections of the file and merge hence the multiway merge. It definitely works, but it is anything but fast. Especially when it's all on the SAME physical disk.
basszero
Yes, that's a good and short introduction
Dmitry
+1  A: 

As far as I know most indexes use some form of B-trees, which do not need to have stuff in memory. You can simply put nodes of the tree in a file, and then jump to varios position in the file. This can also be used for sorting.

che
Yeah, I get that but certainly there are algorithms for things like this.
Dana Robinson
A: 

You would have to partition your data set in some way. Spread out each partition on a separate server's RAM. If I had a billion 32-bit int's - thats 32 GB of RAM right there. And thats only your index.

For low cardinality data, such as Gender (has only 2 bits - Male, Female) - you can represent each index-entry in less than a byte. Oracle uses a bit-map index in such cases.

blispr
No, that's 4 GB of RAM. And we want this to run on higher-end desktops so server/cluster tricks are not an option. And the KIND of index doesn't matter - I want to know how to handle moving data around that can't fit in memory. That's the fundamental problem.
Dana Robinson
Ouch! I stand corrected. Have you considered index compression. Assuming your data-set size does not fit into RAM (even after compression) , you will need a page-in / page-out algorithm based on queries against the index.
blispr
I want to assume that we run out of RAM, regardless of whether compression is used or not.
Dana Robinson