tags:

views:

95

answers:

5

Hi all, I'm planning to generate a huge amount of data, which a I'd like to store in a MySQL database. My current estimations point to four thousand million rows in the main table (only two columns, one of them indexed).

Two questions here:

1) is this possible?

and more specifically:

2) Will such table be efficiently usable?

thanks!, Jaime

+1  A: 

You should consider partitioning your data...for example if one of the two columns is a name, separate the rows into 26 tables based on the first letter.

AJ
+3  A: 

Sure, it's possible. Whether or not it's usable will depend on how you use it and how much hardware/memory you have. With a table that large, it would probably make sense to use partitioning as well if that makes sense for the kind of data you are storing.

ETA:

Based on the fact that you only have two columns with one of them being indexed, I'm going to take a wild guess here that this is some kind of key-value store. If that is the case, you might want to look into a specialized key-value store database as well.

Eric Petroelje
Good point about a key-value database. If this is what you need, take a look at Redis: http://code.google.com/p/redis/
Jeff
+1 for mentioning key-value database
BlueRaja - Danny Pflughoeft
A: 

I created a mysql database with one table that contained well over 2 million rows (imported U.S. census county line data for overlay on a Google map). Another table had slightly under 1 million rows (USGS Tiger location data). This was about 5 years ago.

I didn't really have an issue (once I remembered to create indexes! :) )

Jason
He has 4 **billion** rows of data. That is a magnitude of 1000 more than what you had so unfortunately I don't think your answer is very relevant.
kigurai
+1  A: 

It may be possible, MySQL has several table storage engines with differing capabilities. I think the MyISAM storage engine, for instance, has a theoretical data size limit of 256TB, but it's further constrained by the maximum size of a file on your operating system. I doubt it would be usable. I'm almost certain it wouldn't be optimal.

I would definitely look at partitioning this data across multiple tables (probably even multiple DBs on multiple machines) in a way that makes sense for your keys, then federating any search results/totals/etc. you need to. Amongst other things, this allows you to do searches where each partition is searched in parallel (in the mutiple servers approach).

I'd also look for a solution that's already done the heavy lifting of partitioning and federating queries. I wonder if Google's AppEngine data store (BigTable) or the Amazon SimpleDB would be useful. They'd both limit what you could do with the data (they are not RDBMS's), but then, the sheer size is going to do that anyway.

T.J. Crowder
"...but it's further constrained by the maximum size of a file on your operating system" -- doesn't mysql have a mode where it can use a raw, unpartitioned disk for storing data? That might avoid filesystem file-size limitations.
Here's a link; doesn't mention avoiding filesize limitations. http://dev.mysql.com/doc/refman/5.0/en/innodb-raw-devices.html
@unknown: And that's very cool. As I said, MySQL *"...has several table storage engines with differing capabilities...the MyISAM engine, for instance...[is] further constrained..."* (see the link embedded in the sentence). Very cool that InnoDB (a different engine) can use raw devices.
T.J. Crowder
A: 

4 gigarows is not that big, actually, it is pretty average to handle by any database engine today. Even partitioning could be an overkill. It should simply work.

Your performance will depend on your HW though.

sibidiba