tags:

views:

642

answers:

7

Daily 20-25 million rows that will be removed at midnight for next days data. Can mySQL handle 25 million indexed rows? What would be another good solution?

A: 

From my experience, mySQL tends to not scale well at all. If you must have a free solution for this I would highly recommend postgreSQL.

Also (this may or may not be an issue for you), but keep in mind that if you're dealing with that much data, the maximum size of a mySQL database is 4 terabytes, if I remember correctly.

I don't think there is a practical limit on the max number of rows in mySQL, so if you MUST use mySQL, I think it would work for what you want to do, but personally for a production system I wouldn't recommend it.

Alex Beardsley
Back your statement up with metrics. Simply saying "x does not scale well" is usually code for "I do not understand how to use x properly".
Rex M
As much as I detest MySQL and love PostgreSQL, 25 million rows is not going to be a problem for it.
kquinn
@Rex M the operating keyword was "from my experience", as in, anecdotal.
Alex Beardsley
+6  A: 

You give very little information on the context but sometimes not using a database and instead a binary/plain text file is just fine and can -- depending on your requirements -- be much more efficient and maintainable. e.g if it's sensor data storing it in a binary file with each record at a known offset could be a good solution. You saying that the data would be deleted every 24h seems to indicate that you might not need some the properties of a relational database solution such as ACID, replication, integrated backup and so on, so perhaps a flat file approach is just fine?

Ben Schwehn
A: 

As a general solution I'd recommend PostgreSQL too, but depending on your specific needs, other solutions might be better/faster. For example, if you do not need to query your data while it is being written, TokyoCabinet (the table based API / TDB) might be faster and more lightweight/robust.

mjy
A: 

I haven't looked into them in mysql, but this sounds like a perfect application for table partitions

Matthew Watson
+5  A: 

Our MySQL database has over 300 million rows indexed and we only ever experience problems with complex joins running a little slow - most can be optimized though.

Handling the rows was no problem - the key to our performance was good indexes.

Considering you are dropping the information at midnight, i would also look at MySQL partitioning which would allow you to drop that part of the table whilst allowing the next day to continue inserting if need be.

Jarod Elliott
That's just 15 days worth of data for the questioner.
duffymo
@duffymo Yes, but it's getting removed each night at midnight so it's not going to build up any bigger over time.
Jarod Elliott
+3  A: 

The issue is not the number of rows itself -- it's what you do with the database. Are you doing only inserts during the day followed by some batch report? Or, are you doing thousands of queries per second on the data? Inserts/Updates/Deletes? If you slam enough load at any database platform, you can max it out with a single table and a single row (taking it to the most extreme). I used MySQL 4.1 w/ MyISAM (hardly the most modern of anything) on a site with a 40M row user table. It did < 5ms queries, I think. We were rendering pages in less than 200ms. However, we had lots and lots of caching set up, so the number of queries wasn't too high. And, we were doing simple statements like SELECT * FROM USER WHERE USER_NAME = 'SMITH'

Can you comment more on your use case?

ShabbyDoo
+1  A: 

If you are using Windows, you could do worse than use SqlExpress 2008, which should easily handle that load, depending on how many indexes you are creating on it. So long as you keep < 4GB total db size, it shouldn't be a problem.

WOPR