views:

99

answers:

4

For the same data set, with mostly text data, how do the data (table + index) size of Postgresql compared to that of MySQL?

  • Postgresql uses MVCC, that would suggest its data size would be bigger

  • In this presentation, the largest blog site in Japan talked about their migration from Postgresql to MySQL. One of their reasons for moving away from Postgresql was that data size in Postgresql was too large (p. 41): Migrating from PostgreSQL to MySQL at Cocolog, Japan's Largest Blog Community

  • Postgresql has data compression, so that should make the data size smaller. But MySQL Plugin also has compression.

Does anyone have any actual experience about how the data sizes of Postgresql & MySQL compare to each other?

+2  A: 

Both have their storage requirements in their respective documentation:

MySQL: http://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html
Postgres: http://www.postgresql.org/docs/current/interactive/datatype.html

A quick comparison of the two don't show any flagrant "zomg PostGres requires 2 megabytes to store a bit field" type differences. I suppose Postgres could have higher metadata overhead than MySQL, or has to extend its data files in larger chunks, but I can't find anything obvious that Postgres "wastes" space for which migrating to MySQL is the cure.

Marc B
+3  A: 
  • MySQL uses MVCC as well, just check innoDB. But, in PostgreSQL you can change the FILLFACTOR to make space for future updates. With this, you can create a database that has space for current data but also for some future updates and deletes. When autovacuum and HOT do their things right, the size of your database can be stable.
  • The blog is about old versions, a lot of things have changed and PostgreSQL does a much better job in compression as it did in the old days.
  • Compression depends on the datatype, configuration and speed as well. You have to test to see how it's working for you situation.

I did a couple of conversions from MySQL to PostgreSQL and in all these cases, PostgreSQL was about 10% smaller (MySQL 5.0 => PostgreSQL 8.3 and 8.4). This 10% was used to change the fillfactor on the most updated tables, these were set to a fillfactor 60 to 70. Speed was much better (no more problems with over 20 concurrent users) and data size was stable as well, no MVCC going out of control or vacuum to far behind.

MySQL and PostgreSQL are two different beasts, PostgreSQL is all about reliability where MySQL is populair.

Frank Heikens
I read that InnoDB wasn't truly MVCC. Not entirely sure what that really means. Maybe someone could clarify here?
Continuation
Continuation
MySQL versions 5.4 and/or 5.5 are still in development and not ready for production. The only GA-version MySQL has at this moment, is version 5.1. PostgreSQL versions 8.3 and 8.4 are stable for a long time and (in my situation) outperform MySQL. But again, reliability is much more important for me.
Frank Heikens
One annoying InnoDB feature in MySQL is that by default all tables and indexes are stored in one file, ibdata1. Claimed space is never returned and the file cannot be shrunk in not other way than to dump data, delete file and then restore the dump. So the actual size of the file does not necessarily reflect the size of database. There are workarounds this problem.
John P
+1  A: 

About indexes,

MySQL stores the data whithin the index which makes them huge. Postgres don't. This means that the storage size of a b-tree index in Postgres doesn't depend on the number of column it spans or which data type the column has.

Postgres also supports partial indexes (e.g. WHERE status=0) which is a very powerful feature to prevent building indexes over millions of rows when only a few hundred is needed.

Since you're going to put a lot of data in Postgres you will probably find it practical to be able to create indexes whitout locking the table.

Sent from my iPhone. Sorry for bad spelling and lack of references

Martin Torhage
+1  A: 

I'd like to add that for large columns stores, postgresql also takes advantage of compressing them using a "fairly simple and very fast member of the LZ family of compression techniques"

To read more about this, check out http://www.postgresql.org/docs/9.0/static/storage-toast.html

It's rather low-level and probably not necessary to know, but since you're using a blog, you may benefit from it.

mikelikespie