ansaurus

Question

Good database for large table with simple key access

Answer 1

+1 A:

Here's a reasonably sized example of a MySQL database using the innodb engine which takes advantage of clustered indexes on a table with approx. 125 million rows and with a query runtime of 0.021 seconds which seems fairly reasonable.

http://stackoverflow.com/questions/3534597/rewriting-mysql-select-to-reduce-time-and-writing-tmp-to-disk/3535735#3535735

http://pastie.org/1105206

Other useful links:

http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html

http://dev.mysql.com/doc/refman/5.0/en/innodb-adaptive-hash.html

Hope it proves of interest.

f00 2010-10-12 18:31:23

Answer 2

A:

CouchDB will give you storage by key and you can create views to do the query/sorting. Second option could be cassandra, but there's a pretty big learning curve.

Matt Williamson 2010-10-12 18:34:27

Answer 3

A:

CouchDB and MongoDB and Riak are all going to be good at finding the key (account) relatively quickly.

The problems you're going to have (with any solution) are tied to the "order by" and "account in" clauses.

Problem #1: account in

120M records likely means gigabytes of data. You probably have an index over a gig. The reason this is a problem is that your "in" clause can easily span the whole index. If you search for accounts "0000001" and "9999581" you probably need to load a lot of index.

So just to find the records your DB first has to load potentially a gig of memory. Then to actually load the data you have to go back to the disk again. If your "accounts" on the in clause are not "close together" then you're going back multiple times to fetch various blocks. At some point it may be quicker to just do a table scan then to load the index and the table.

Then you get to problem #2...

Problem #2: order by

If you have a lot of data coming back from the "in" clause, then order by is just another layer of slowness. With an "order by" the server can't stream you the data. Instead it has to load all of the records in memory and then sort them and then stream them.

Solutions:

Have lots of RAM. If the RAM can't fit the entire index, then the loads will be slow.
Try limiting the number of "in" items. Even 20 or 30 items in this clause can make the query really slow.
Try a Key-Value database?

I'm a big fan of K/V databases, but you have to look at point #1. If you don't have a lot of RAM and you have lots of data, then the system is going to run slowly no matter what DB you use. That RAM / DB size ratio is really important if you want good performance in these scenarios (small look-ups in big datasets).

Gates VP 2010-10-18 19:02:05

ansaurus

tags:

views:

answers:

Good database for large table with simple key access

related questions