ansaurus

Question

Upper Limit for Number of Rows In Open Source Databases?

Answer 1

+7 A:

PostgreSQL should be able to amply accommodate your data -- up to 32 Terabytes per table, etc, etc. If I understand correctly, you're talking about 5 GB currently, 10 GB max (about 36 bytes/row and up to 300 million rows), so almost any database should in fact be able to accommodate you easily.

Alex Martelli 2009-07-16 21:30:57

+1 for postgres, if you're going to do any stat work on the data (and "data mining" implies you will) then with postgres, you can use PL/R and it can make you're life easier.

rfusca 2009-07-18 03:28:37

Answer 2

+2 A:

MySQL is more than capable of serving your needs as well as Alex's suggestion of PostgreSQL. Reasonable performance shouldn't be difficult to achieve, but if the table is going to be heavily accessed and have a large amount of DML, you will want to know more about the locking used by the database you end up choosing.

I believe PostgreSQL can use row level locking out of the box, where MySQL will depend on the storage engine you choose. MyISAM only locks at the table level, and thus concurrency suffers, but storage engines such as InnoDB for MySQL can and will use row-level locking to increase throughput. My suggestion would be to start with MyISAM and move to InnoDB only if you find you need row level locking. MyISAM works well in most situations and is extremely light-weight. I've had tables over 1 billion rows in MySQL using MyISAM and with good indexing and partitioning, you can get great performance. You can read more about storage engines in MySQL at MySQL Storage Engines and about table partitioning at Table Partitioning. Here is an article on partitions in practice on a table of 113M rows that you may find useful as well.

I think the benefits of storing the data in a relational database far outweigh the costs. There are so many things you can do once your data is within a database. Point in time recovery, ensuring data integrity, finer grained security access, partitioning of data, availability to other applications through a common language. (SQL) etc. etc.

Good luck with your project.

RC 2009-07-17 00:09:09

Answer 3

+3 A:

FYI: Postgres scales better than MySQL on multi-processor / overlapping requests, from a review I was reading a few months back (sorry, no link).

I assume from your profile this is some sort of biometric (codon sequences, enzyme vs protein amino acid sequence, or some such) problem. If you are going to attack this with concurrent requests, I'd go with Postgres.

OTOH, if the data is going to be loaded once, then scanned by a single thread, maybe MySQL in its "ACID not required" mode would be the best match.

You've got some planning to do in case of access use case(s) before you can select the "best" stack.

Roboprog 2009-07-17 18:36:55

There will almost certainly be no concurrent requests, this is a database only for myself. I'd just like to replace a lot of my hacky loops over text files with SQL queries, because it will make things smaller and less likely to contain bugs. Thanks for the tip!

James Thompson 2009-07-17 19:53:20

ansaurus

tags:

views:

answers:

Upper Limit for Number of Rows In Open Source Databases?

related questions