tags:

views:

633

answers:

4

Between Mysql and PostgreSQL,which is suite for very large scale of data..for example, millions of record...i think,i should use PostgreSQL...any suggestion guys?

+3  A: 

I've used both in similar situations, and sheer size of the DB doesn't seem to affect their scaling in substantially different ways. PostgreSQL is much more complete and solid, and will much better support complex queries and their optimization, while MySQL may shine in terms of retrieval speed for extremely simple queries; but these aspects are independent of the sheer size issue.

Alex Martelli
+1  A: 

Well, it ultimately depends on what you are most comfortable with. According to MySQL, there is no imposed theoretical limit on the size of the database...it depends on the capability of the hardware supporting it. With the number of rows, using InnoDB, the theoretical limit is 256 terabytes. The reason I keep throwing out theoretical is that, there is probably a very small chance that you could possibly index 256 terabytes of data, so that is what they are approximating might be a limit. If you hit that max, you got bigger problems. Current users of MySQL in production, that I can think of, are YouTube and Facebook. Those are probably the two largest...and it appears that they are faring well.

But once again, as I stated above. It is whatever you are most comfortable with.

+1  A: 

I think it depends a lot on what you mean by "better". You should probably identify your needs before choosing one or the other.

Faster? More reliable? Allows replication? Can do more complex queries? Is your application amenable to "sharding" in which case you probably want a database which can cluster and be administered more easily, or do you need everything in one massive set of linked tables, in which case you probably want good support for many cores and large memory. Do you have a complex authentication set up or is it a simple "one user" web application? Is the bulk of the data in binary objects, or is it simple numbers and strings? How will you do your backups?

MySQL and PostgreSQL both seem to be very capable databases, and both have been used successfully at large scale, so I'd suggest you need to identify the specific needs of your application first.

My inclination would be towards PostgreSQL, but that's mainly because I had a few disasters with MySQL losing data a few years ago and I haven't come to trust it again. PostgreSQL has been very nice in terms of being able to make backups easily.

Colin Coghill
sorry because i didn't told you about the needs...of course it must be reliable and faster...the purpose is for reporting system,but it plays with million of record..
Interesting story about MySQL. Can you expand on your disaster stories?
User1
it was a while ago (MySQL 4 maybe?) and we had a UPS fail on us, crashing the database server. Our PostgreSQL database on the same machine was absolutely fine, but the MySQL one was unrecoverable. (yes we had backups but still lost about 23 hours of data)
Colin Coghill
+1  A: 

Postgres has a richer set of abilities and a better optimizer; its ability to do hash joins often makes it much faster than MySQL for joins. MySQL is rumored to be faster for simple table scans. The storage engine you use underneath matters a lot, as well.

At some point, scaling becomes a choice between two options: scale by buying bigger hardware, or scale by introducing new machines (which you can shard the data to, use as slave replicas, or try a master-master setup -- both Posgres and MySQL have solutions of various levels of quality for these sorts of things).

A few million rows of table data fit in a standard server's memory these days; if that's all you are doing, you don't need to worry about this stuff -- just optimize whatever database you are most comfortable with, to ensure the proper indexes are created, everything is cached (and something like memchached is used where appropriate), and so on.

People mention that Facebook uses MySQL; that's kind of true. Kind of because what they are actually doing is using hundreds (thousands now?) of mysql databases, all of them responsible for their own little cross-section of the data. If you think you can load facebook into a MySQL (or postgres, or oracle) instance... well, they'd probably love to hear from you ;-).

Once you get into the terabyte land, things get difficult. There are specialized solutions like Vertica, Greenplum, Aster Data. There are the various "nosql" datastores like Cassandra, Voldemort, and HBase. But I doubt you need to go to such an extreme. Just buy a bit more RAM.

SquareCog