views:

5156

answers:

20

We are currently working with JavaEE and MySQL 5 in our company, but we have some queries, especially delete queries that take > 10 min to execute. We consider to switch to PostgreSQL.

What are the advantages of PostgreSQL over MySQL if there are any? Do you have experiences with both DBs and may give me a roundup whether this is a good idea or does this fully depend on the needs of our server?

Thank you for your opinion!

Greetz, GHad

+21  A: 

Well it don't really answer your question, but if you have delete queries that take longer than 10 minutes to execute, its probably not MySQL that is your problem but the query design or database layout. You will probably get more ROI in that space that taking the switch to another database platform.

Jesper Blad Jensen aka. Deldy
Well said. I too recommend a) hiring database people, b) a workshop for the development team or c) a couple books on how to design databases and how to profile queries.
Till
could be locking in MySQL
MkV
A: 

What brought you to the conclusion that PostGre will be better?

Ian P
Ian: The Socratic approach doesn't work terribly well on stackoverflow, just so that you know.
Arafangion
+4  A: 

I would agree with the others. A delete that takes 10 minutes probably won't speed up significantly with another database vendor. You might get it down to 5 minutes, which is still excruciatingly slow. I would say to look at the query plan to see what's actually taking so long with this delete. MS SQL Server has the query analyzer for this. I'm sure MySQL has similar tools.

Kibbee
+3  A: 

I vote for Postgres. More mature.

Flinkman
+4  A: 

The traditional wisdom is that you should use Postgres if you have a complicated schema that needs lots of joins or if you have heavy writing. MySQL (traditionally) has been for databases that rarely have writes.

I say "traditionally" because MySQL is catching up with Postgres in features and Postgres is catching up with MySQL in pure read speed. Still, if you have complicated joins, Postgres should be better.

Neall
A: 

We picked MySQL over both Postgres and SQL Server simply because there seemed to be a lot more community around it - if we ran into trouble or needed advice, the feeling we got was that help would be a quick Google search away, and that's worked out pretty well so far.

With that in mind, maybe a better question to ask is "Why is my delete taking so long?" - as mentioned above, you'll get better ROI from investigating the core problem than from doing a full replacement. If nothing else, you'll be able to understand what's causing the delay and then you can see if Postgres handles that situation better.

jasondoucette
picking a technology over another based on the size of the community sounds childish to me.
Andrei Rinea
not really; a) it makes it less likely that you'll run into bugs (someone else will have found them first), b) it means you can be more confident that it has a future; c) you'll find it easier to get help online.
Mark Baker
The problem with the 'large community' is that it creates a lot of bad advice along with good advice because everyone who has managed to install MySQL is automatically an expert. As for help, better documentation from the developers (which postgresql has) is better than advice from random user.
MkV
Mark Baker forgot the point d): a project with a big community is much like to exist in the next 5 years.. a small project can die suddenly. james2vegas got a good point too: much 'poor' aids, or few but good? In this case, i dont think that the differences between the mysql or pgsql community play a big role anyway.
DaNieL
The PostgreSQL community is hardly "small". When's the last time you visited a .org or .info website? OK, you just used PostgreSQL, because Afilias uses it to serve .org and .info DNS. You can get an answer within seconds on the #postgresql IRC channel, or minutes on the mailing list. So, in this case, "more people use MySQL" is like saying "I won't use Linux, because more people use Windows". It's a silly argument.
Randal Schwartz
+13  A: 

In general, PostgreSQL is a slightly better DB than MySQL. There's a few of reasons why this is so - however I can tell you right off the bat that switching DBs isn't going to fix your problem. PostgreSQL is a nice DB but it's not magically going to make a ten minute query execute instantly.

What you need to do is figure out why you're getting horrific performance. Is MySQL configured right? Running on decently specced hardware? Is your table design sane? How about your queries? Are you sure they're sane? Because if I had to guess...

Cody Hatch
In some cases switching DBs can make a ten minute query execute instantly, a better optimiser can easily do that. However, it's unlikely that a delete query would be affected unless you're doing some complicated joins in it.
Mark Baker
+1, nothing is automagically with db design!
DaNieL
+70  A: 

I prefer Postgres. This is mostly because Postgres is a little better with joins and subqueries, has a great explain analyze, has fewer quirks, provides nicer error messages and has a better command line than MySQL.

MySQL

The database is very common, especially amongst web developers.

A subquery with a depth of 3 levels cannot be optimized by MySQL anymore and is executed on every row.

There are a lot of little quirks that are annoying, for instance that time fields do not record milliseconds, subqueries with joins may return no data, unexplainable crashes may happen etc.

It is nice that you can exchange database engines underneath, which might be very handy. On the other hand almost all engines except InnoDB are not very good because they don't support Transactions and Replication. In particular MyISAM does not enforce constraints at all.

Error messages on MySQL are often just a number, this is quite like Oracle does things and it stinks.

I found that MySQL seems to be a bit faster for single table queries on MyISAM. That might be because of better caching.

The command line for is easy to use, but MySQL extended SQL in order to provide functionality other command line clients offer with \shortcuts; I find this improper.

The EXPLAIN/DESCRIBE command for analysis although this command is not as useful as the one in postgres.

Postgres

Postgres does its best to optimize what you throw at it. The most common reason for badly performing Postgres queries is missing or wrong indexes.

The "explain analyze ..." command that is really helpful at analyzing query performance.

You cannot switch engines, but transactions are support by default and as of recently, it can also do replication out of the box.

Error messages in Postgres are often descriptive and sometimes even explain to you what you have to do quite accurately.

The command line is easy to use and whenever you wonder what to do with it you can type \help

Your specific problem

Probably if a delete takes 10 minutes, your condition for deletion is very time consuming or you do a cascade delete that necessarily needs to join a lot.

The Blob Data type in MySQL is probably more complex to store, iterate and retrieve than fixed length text fields. Blobs are also size limited, and if a file should exceed that limit the trailing bytes get cut off. You might try to store the files as files on the file system and create a fixed length text field in your table that stores the filename. If you are serving that data with a web server, that would also make it faster since it is way faster to let Apache serve a file than fetching it from a database and streaming it through your code.

Florian Bösch
MySQL has an equivalent command to EXPLAIN ANALYZE. You can use EXPLAIN or DESCRIBE (they are the same) to find out what's going on. It's not nearly as nice as the PostgreSQL version, and is kind of hard to read (there are guides online) but it will give you a place to start.
MBCook
A: 

Thank you for all the response.

About the queries: We do nothing special here, it's simply a NamedQuery with a simple select on one table. But multiple columns of the table entries have blob data (PDFs), so we certainly must look on other reasons like defragmentation or may we configured mysql to make full logs so MySQL would log very much. I'll take a look on that.

I will also look at the design of our columns/entity beans if I have the time. Maybe you're right and we are able to speed up the queries with another design. I need a good book on this.

I'm also sorry that a cannot give you more details. A co-worker of mine has complained several times last week so I thought to ask this question because our boss is convinced that PostGreSQL is far better although he has no experience. I just asked because I wanted to know if a switch makes sense, although I'm not directly involved with this part of our server. I simply hate to follow my boss advice blindly without any after looking especially if he just read about something without trying...

Anyway, thank you for your hints. That helped me a lot.

Greetz GHad

GHad
Can you post the output ofshow innodb status\GThere might be some quick wins with tweaking your innodb buffer or log settings, but it sounds like you need to revisit your schema.
Dave Cheney
+2  A: 

I agree with the comments that switching to Postgres is not the solution. Since you mention that "multiple columns of the table entries have blob data (PDFs)", I would guess that that's your problem.

It's a bit of a religious war whether to store BLOBs in your tables. Sadly often more heat is shed than light. Consider revisiting your decision to store BLOBs in your tables. I would side with folks who think you should store file references to your BLOB data from your tables, because it makes the queries faster. Your RDBMS will have less bytes to traverse over when it does a projection. Admittedly this means that your system of record for your BLOBs is in two places (the file system and the database) and you have to keep the two in sync. In other words, deleting the reference from the file reference in your file table should be in the same transaction as deleting the file from the file system.

Alan
A: 

I'm totally for storing BLOB's in the DB with the correct schema. If not already, consider using a meta table with fixed length columns for faster seeking/delete/update. Store the BLOBS/files in a 2nd related table and chunk them into 64k blocks. This makes a huge difference. You can also consider flagging the BLOB's deleted when you delete the parents, and then have a late night scheduled job to actually delete them and re-optimize the tables.

Reference to a BLOB 64k implementation. http://www.dreamwerx.net/phpforum/?id=1

Good luck.

DreamWerx
PostgreSQL stores large BLOBs in a separate TOAST table
MkV
It sounds very anti-Occam to me.
Lluis Martinez
+64  A: 

We use MySQL where I work, and we've looked at PostgreSQL (it's running for one small system, as a test). They both have their pluses and minuses. Note that for the purposes of this discussion I refer to MySQL 5.0.x with InnoDB. While 5.1 may fix some of these things, it's not stable yet.

MySQL - Good Points

Probably MySQL's best point is that it's so common. Since it's part of LAMP, everyone and their brother provides it as an option. It runs on Windows, Linux, and anything else. You can't swing a stick at a web host without having MySQL available. There are TONS of people out there that can give you advice on it.

I like the tools (really, the command line client) better. It feels quite friendly to me. Want to see the tables in a database? SHOW TABLES. Want to see the databases? SHOW DATABASES. Want to see the status of your replication against the master? SHOW SLAVE STATUS. PostgreSQL feels quite a bit like Oracle to me. There is no SHOW TABLES, it's \dt (IIRC). To quit it's not QUIT or EXIT, it's \q.

Replication in MySQL is pretty nice. It's built in (PostgreSQL didn't have built in replication until recently). The last time we spent much time looking (last year), PostgreSQL replication was based on triggers, which we found a little iffy (in theory). MySQL replication has some limitations (basically must use InnoDB, it's statement based not data based (changes in 5.1, I think)), but it's been working great for us. It can do multiple masters, multiple slaves, chains. Again, this is a known quantity, since MySQL is so common.

MySQL - Bad Points

MySQL has some very serious limitations. It is very important that you understand what you're getting into.

The biggest one, the one that's a killer for us, is the inability to add or remove columns or indexes without locking the table. We have tables with tens of millions of records. We can't edit them. To add a new column or index locks the table from reads and writes, which is killer for us. We don't know how long it would take, but it would be hours, minimum. When we HAD to add a new column last year, we did it by making a new table and always doing a join. It was the only way to do things without taking the server out of production (or going through a large mess since we use replication).

MySQL can be dead stupid at times with indexes. It's important to run a DESCRIBE or EXPLAIN on things to see if it's doing something sane. Sometimes we've had to use FORCE INDEX to get good performance. This is made worse by the fact that it can't use multiple indexes (for the most part). If you want it to use your index on the date column and your index on the email column, you need to create an index that is on both columns. This is supposed to be fixed/improved in 5.1 (I think, haven't tested 5.1).

Subqueries can be a big problem too. A query can run fine. A subquery can run fine. But when you get to three levels of queries (or more) MySQL can (and usually does) just give up. So instead of doing things a smart way (even if your subquery is as easy as a constant "SELECT id FROM table WHERE id = '5'), MySQL will just start running that query for every row, killing performance. Again, you have to use EXPLAIN / DESCRIBE.

The error messages in MySQL are worse, which is really a nitpick. If you try to create a table with a foreign key and you get something wrong, it will just throw an error 150. It doesn't tell you what the problem is (usually mismatched column definitions), just error 150. You have to go look the error up just to see that 150 is a foreign key problem. Other error messages are much more helpful.

Then there is the, well, we'll call it buggyness and weirdness. MySQL performance, at least on Windows and Linux would bomb with more than 4 (possibly 8) CPUs. I hear that it's not a problem on Solaris (thanks to Sun's work and the Niagara processors). This is something you must be aware of. I believe indexes must be held in memory for InnoDB. If you want a giant index that you will only use once a week, too bad. This may have changed, or it may be fixed in 5.1, I don't know.

You also run into fun little things. There is a condition we've run into involving subqueries and joins in 5.0 that causes MySQL to return no data. Despite what explain says, what the partial query parts show, etc, you just get zero rows back. You change your code just a little and the bug isn't hit and you get your data. TIMESTAMPs don't record milliseconds, you have to do that by hand with another column. We also once ran into a situation where the date format command could crash MySQL (this was in the 4.0 or 4.1 days). You just have to be aware that weird little things like these crop up. Did I mention that I've heard (from someone I really trust who has had experience with this) that stored procedures / triggers could crash the MySQL server in 5.0? This was earlier in the 5.0 branch and is probably fixed by now... but you need to watch MySQL closely.

This should go without saying, but if you go with MySQL use InnoDB. Depending on your setup that may not be the default engine. Change that. Everything good on MySQL is InnoDB. Replication and transactions both need InnoDB, MyISAM doesn't have them. There are other storage engines, but they are much more specialized.

PostgreSQL - Good and Bad

Now I don't have a ton of experience with PostgreSQL. Like I said we've started to experiment with it. The fact that it doesn't have many of MySQL's limitations is a big plus. Just he ability to add a column on a large table without locking the thing for a huge amount of time would be great for us. PostgreSQL can also use multiple indexes which, again, is a serious plus. The error messages that PostgreSQL returns can be much more informative than MySQL. At times when you screw up a query instead of "this is impossible" or "you can't do that", you get something more akin to "this is impossible because of X". Again, this is just my impression. Basically, PostgreSQL feels more like an open source Oracle (a grown up database) than MySQL (which doesn't feel very Oracle-y).

Now the tools are MUCH less friendly, but if you come from an Oracle (or probably DB2) world you'll be used to it. While they require more cryptic commands (see MySQL tools, above), they do work very well.

There is the replication thing. When we picked PostgreSQL back up this year to play with for a new little system, we discovered that they took one of the formerly external replication systems and it has been put into the tree. This is a very good step, as the replication situation was a big problem for us. Before there were little third party things (based on triggers) that you could take your chances with, and there were paid replication solutions. Having one built in is quite nice.

Summary

My advice? While I have less experience with it if I was starting with a new system I think I'd go with PostgreSQL. I've seen enough weirdness in MySQL that I'd be willing to try it. We haven't had any problems with it so far, and I know many people use it. MySQL is improving fast. It's gained replication for a disk based storage engine, stored procedures, better index choosing, and it's become much more strict about not allowing obviously bad data (like the date '0000-00-00') or data that violates a key.

Switching is not a decision to be taken lightly. Just moving the data across will be a big problem if you haven't been very strict about validation in the past.

The most important thing to do though is to just try it. Set-up a test server and a quick codebase hack to get your code running. Put some data in and benchmark it. Maybe it won't be substantially faster than your current setup and it won't be worth looking at.

To Your Update

In regards to the small update you posted, I have two comments. For the BLOBs, are they being stored external to the table or inline? I know Oracle can do that, and I'm guessing PostgreSQL can, but I don't remember about MySQL. That may be a big boon.

As for storing the PDFs in BLOBs in the columns, from what I've been taught that's a little weird and could cause your performance problem. I haven't had a ton of experience in this area.

Have you tried storing the PDFs in one or more other tables and just storing the IDs? That is, normalizing them out? That may help your performance problems.

Those are just stabs in the dark though.

MBCook
Great answer, very detailed.
harto
Regarding bugs, I worked on a small project, setting up Wordpress blog for a friend, using MySQL 4.1. Routinely, when the site experienced heavy traffic, the site would just fold under. According to the MySQL database logs, the primary keys on some of the tables became corrupted. Let me repeat that: the InnoDB engine screwed up the most basic, fundamental task that no database should ever fail to support --- it corrupted the primary keys. Sure, it took about 30 seconds to fix using REPAIR, but the site failed so often that the site was more or less inoperable.
Juliet
@Juliet: 5 and up is pretty rock solid, unless you stray to some edgier features. For simple stuff like you're describing, we haven't had that kind of problem on our production servers with 5 or 5.1 even though they've run for 2+ years.The earlier versions though... there was some fun stuff there. Sometimes, using the wrong date format string when having it format a date for display could crash the MySQL server process.
MBCook
+2  A: 

See this page.

I also find that MySQL is terrible at optimizing queries, especially when you use subqueries. I found that when I rewrote an easy to understand subquery based select query as an impossible for humans to understand query that is based on clever joins, and group by clauses that are invalid in normal databases, it ran many times faster, because it used the index that it didn't use before. A clever optimizer would have used an identical query plan for both versions of the query.

MySQL seems to require locking of the whole table for simple operations, where as postgres never locks reads because of it's MGA architecture. You don't have to do anything special (like select a special kind of table) to have it work well.

It's possible that MySQL has got better, and with the latest versions things would be more equal, but I'm more than happy with my switch to Postgres.

rjmunro
+4  A: 

postgresql +1

but look at a mark for deletion column with a different offline application/server that issue the delete query.

l_39217_l
+2  A: 

For handling BLOB data, I would definitely recommend PostgreSQL. They have a specific API for dealing with large objects. Its fast and supports partial writes (which MySQL does not).

Michael Barker
+24  A: 

I have a pretty good experience with both MySQL and PostgreSQL. I found PostgreSQL to be more reliable, faster and better generally, but I wouldn't recommend it blindly. You have to know what you're looking for in a DB to be able to choose. I had experience in putting PostgreSQL to be a good replacement for SQL Server in environments where we migrated everything from SQL Server to Postgres, and did much better with the same hardware (switching from Windows to Linux as well).

PostgreSQL also gives you way more options to optimize your performance, and tune your database in general - as long as you know what you're doing, you can mess around as you wish in its configuration files. One thing that kills me in MySQL is the damn my.cnf file. Really, it kills me. It's a file that can be anywhere, and nowhere to be found. You can even have a my.cnf file in your home dir - and yes, depending on your configuration and how you invoke your server, it can wildly interfere with the works of the db.

I've had serious troubles with servers with randomly placed my.cnf's files, giving me big headaches to find where in the world was MySQL getting its configurations. It's much, much, much better to have just one or two .conf files, as PostgreSQL has.

I've also found PostgreSQL to be a decent replacement for Oracle in certain environments, where Oracle wasn't really necessary to begin with, but somehow somebody got convinced (a business person flooded with the nonsense sales arguments probably was the person who made the call) and bought it. It was totally unnecessary, and we could replace it without any major problems by PostgreSQL. It was a while ago, version 6.xxx of PostgreSQL, but yet it made a good job in replacing Oracle. It was a fairly big database (around 80GB), and PostgreSQL was up to the job. PostgreSQL supports some nice features natively, like stored procedures (that can be in Pl/PgSQL, C, Java, or whatever languages PostgreSQL supports now).

I've also have some experience with MySQL, and have worked with it since its 3.23 version - so for a while. MySQL is generally very good if you don't mind about the inner workings of the database, or if you need a competent database to do something cool and simple.

Well, it can be really fast, really good, and you can really get in trouble and get cornered in the future if you don't think about what you're doing. One thing that I hate the most is the ability to choose your storage engines. While it sounds great (like, in "hey, I can choose! I'm all grow-up!"), it is in reality not very nice. The main flaw with it is that, no matter which engine you choose, you'll always gain and lose stuff (as in functionality/capability) and gain stuff. For instance, if you want referential integrity (create foreign keys and stuff), you can't use MyISAM, and humble along with the poorer (please note: not necessarily bad, just worse than) performance of InnoDB. InnoDB also features lots of silly limitations: size and quantity of columns. You can't have tables with more than 1000 columns in InnoDB. So, it literally kills MySQL when we talk about data warehouses.

It gives me the feeling that MySQL is a sewed thing between many different other products (InnoDB used to be a company in the DB business). You just have to download the source code of both to see the difference: while the source code of PostgreSQL is smaller, more organized, MySQL's source code is bigger, it's source tree is not as organized as Postgres's tree. And oh boy, that makes a huge difference when you compile. Another fact that supports my argument about the source code is the build time: PostgreSQL build is much, much, much faster than MySQL's. Sometimes I think that rebuild my kernel is faster than rebuilding MySQL...

Another killer thing for MySQL, if you're in the business of selling software, as in a software house or software provider (SaS) or something like that, is their crappy licensing policy. Yes, I know, it's GNU/GPL but they have an enterprisey license mode as well, and if you want to package MySQL for distribution with your application, you just can't do it without paying them royalties (licenses fees and etc). PostgreSQL, on the other hand, is licensed under BSD, so you can do just whatever you want with it - even improve it and sell a Putyournamehere-PostgreSQL-with-killer-feature.

There's also another aspect that I'd like to observe: MySQL is actively maintained by MySQL AB, a privately held company. And PostgreSQL is totally supported by the opensource community, no strings attached to any corporations.

And the query planner. Oh boy, don't let me start on the query planner. MySQL query planner just sucks. Not for 'SELECT * FROM sometable'... but if you go deeper and starts to put queries with sub-queries and sub-sub-sub-queries, the planner just gives up on you and starts to read each row in the database, bringing your application down. It'll wake you up in the middle of the night.

So, getting directly to the point, MySQL is great if you want to: - Have a fast, kind of reliable db, to use in simple things. MySQL easy synchronization come in handy when we need to build a high availability system, and is also great for load balancing. - Use any kind of LAMP application. Wordpress, Joomla, PhpBB, you name it. It's just great, and works flawlessly. And is easy to find, at cheap hosting places. Company intranets, websites, etc.

PostgreSQL is better if you: - Need to have a more reliable solution to supply your business needs. Something that can measure-up to and replace SQL Server and Oracle. - Is fast, scalable, tunable, supports advanced database features, like stored procedures in many different languages, different transaction isolation levels. - Synchronization is not as trivial as in MySQL, but well, it just works great.

Some other resources that may be handy: http://en.wikipedia.org/wiki/InnoDB http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems

Macalendas
+3  A: 

I have only minimal experience with mysql, but my experience mirrors many of the opinions above: postgres is way more polished and easy to use.

I have done some tech support with postgres that exposed some problems with it. We had fairly active databases with a lot of churn. That is, we'd write a lot and then have to clear it out after a few weeks or months to make way for more (databases probably only up to maybe 80G on the large end). What we found is that postgres' built in background vacuuming couldn't keep up with recovering all the dead space left by the deletes and the database would get fragmented and very slow. We (tech support) would frequently have to go in, shut everything down and dedicate a few hours to vacuuming and re-indexing. All automated except kicking it off of course, but it was a pain.

I have no idea how well other databases deal with this sort of scenario. A lot of pain could probably be avoided by having a good dba. Our installations were on locked down appliances on customer networks and didn't get much maintenance.

Jeremy Huiskamp
Your criticism is valid, but this area does get updated regularly. One of the things that keeps getting improved in later PostgreSQL versions is how aggressive the autovacuum daemon, the background process you allude to, is by default. It's pretty easy for it to fall behind in V8.1 when that feature was first introduced, it's much better in V8.2, and later versions V8.3 and V8.4 continue refinements here. And rather than being forced to do a manual vacuum as you suggest, sometimes it's possible to improve things by adjusting the autovacuum tuning parameters to make it run more often instead
Greg Smith
+7  A: 

My very simple response to the question is, I vote for Postgres here is why:

  1. Oracle can't purchase and control the future of Postgres (I know MySQL can most likely be forked and be independent of Oracle if need be). For me I wouldn't put my system on a database that is in limbo, which I think MySQL is to some degree with Oracle acquiring Sun.

  2. The online support in the Postgres forums is great.

  3. Postgres has all the features you will most likely need.

  4. Postgres has been proven on mission critical systems.

  5. Postgres tends to be very standards based. Standards based is not an after thought of the developers working on it.

StarShip3000
+5  A: 

Based on my own recent personal experience (July 2009), if your application is geospatial-related, then you should consider PostgreSQL over MySql.

MySql only supports, as of July 2009, Minimum bounding rectangle (MBR), while PostgreSQL supports full geospatial capability (PostGIS) that conforms to the OpenGIS standard.

This fact alone forced us to migrate from Mysql to Postgresql!

ShaChris23
Update: as of 8/13/2009, I just found the following page:http://forge.mysql.com/wiki/GIS_FunctionsGive it time a little bit, then MySql would finally provide full GIS!
ShaChris23
A: 

There are some references here that say MyISAM doesn't support replication. Can someone explain?

http://dev.mysql.com/doc/refman/4.1/en/innodb-and-mysql-replication.html says it will work for both InnoDB and MyISAM. In fact, I just tested it as well.

Thiru