MySQL: are the consistency/data loss/query optimization issues I read about "that bad"?

views:

131

answers:

+3 Q:

MySQL: are the consistency/data loss/query optimization issues I read about "that bad"?

As I've been looking into the differences between Postgres and MySQL, it has struck me that, if what I read is to be believed, MySQL should be (disclaimer: by reading the rest of this sentence, you agree to read the next paragraph as well) the laughingstock of the RMDB world: it doesn't enforce ACID by default, the net is rife with stories of MySQL-related data loss and by all accounts and the query optimizer is a joke.

But none of this seems to matter. It's not hard to tell that MySQL has about a million times* as much hype as Postgres (it's LA**M**P, not LAPP), big installations of MySQL are not unheard of (LJ? Digg?) and I haven't noticed a drop in MySQL's popularity.

This makes me wonder: are these "problems" with MySQL really that bad?

So, if you have used MySQL for a reasonably large project**, what was your experience like? Did you use Postgres as well? How was it worse? How was it better?

*: [citation needed]
**: I'm well aware that, for "small things" (blogs, what have you), MySQL (along with practically every other RDB) is just fine.

+1 A:

We are using MySQL in some applications - and it is doing a pretty good job. In the newer projects we are using the InnoDB engine - and albeit it may be slower than the default engine it is working well.
Right now we are using an ORM mapper - and so most of the complexity is hidden behind the ORM mapper (and working nice).

I think the infrastructure (Tools and information) is one of MySQL's big plusses: we are using really nice tools: Toad for MySQL and MySQL Administrator.

Altough I have to admit that I had a shocking experience last week when helping a friend with a SQL statement and the correleated subquery nearly stopped his MySQL server - but with the trick of enclosing it in another query - it worked really well.
This is nothing which REALLY shocks me - because I've used other DB systems which cost big bucks (I'm looking at you - DB2) - and they had other things to work around. (maybe not as drastic - but still you had to optimize for them).

bernhardrusch 2009-10-27 13:51:32

+1 InnoDB is a must for real DB work with MySQL. Scrap "MyISAM" as soon as possible.

Joachim Sauer 2009-10-27 13:57:54

InnoDB is more suitable for general purpose work and should be the default for new installations.

MarkR 2009-10-27 13:59:28

No, the issues you mention are NOT a big deal. See Google and Facebook as two examples of companies that are using MySQL to accomplish Herculean tasks you'll only ever dream of encountering.

I use the following rules when running a MySQL to prevent headaches down the line:

Take daily, weekly, monthly snapshots of database. More often than not the problems you'll run in to have nothing to do with MySQL, instead it's a boneheaded developer running:

DELETE FROM mytable; # Where is the WHERE?
Use InnoDB by default, the only reason to use MyISAM is for full text search.
Get your database schema under source control.

jakemcgraw 2009-10-27 13:52:14

What does "see google and facebook" mean? Just because 2 major players don't use it doesn't mean it's useless. Otherwise you could claim that all products out there are useless.

Joachim Sauer 2009-10-27 13:58:36

Items 1. and 3. are DB-agnostic and apply equally to all databases, not just MySQL.

Joachim Sauer 2009-10-27 13:59:15

Actually both Google and Facebook DO use it, that was my point see: http://blog.facebook.com/blog.php?post=7899307130 and http://www.mysql.com/customers/view/?id=555 My company also uses, we process ~8,000 transactions per minute.

jakemcgraw 2009-10-27 15:50:23

+2 A:

MySQL can be used for reasonably large applications, provided you really know what you do and don't trust the defaults.

MySQL defaults are optimized to be easy-to-use and to get started quickly and to provide best performance (usually). Other databases choose defaults that are at the very least ACID and are scalable (i.e. choose defaults that are not necessarily the best/fastest for small data sets)

Another item is that MySQL only learned to be a "real database" relatively recent, while almost all competing products started life with full ACID in mind.

MySQL had problems with almost all aspects of ACID at one time or another. Most of them are gone or can be configured away, but you will have to check each one. The problem with troubles in atomicity for example is that you will not notice them until you place your system under heavy load (which often coincides with it being a production system, unfortunately).

So my summary would be: MySQL is capable of working in this environments, but it takes work. And the path it took to get to that point cost it quite a few points in the confidence area.

Joachim Sauer 2009-10-27 13:57:13

"Provided you know what you're doing" – does that imply that, eg, Postgres could be used "without knowing what you're doing"? (given, of course, a reasonable amount of common sense)

David Wolever 2009-10-27 15:40:56

@David: I've not used Postgres as much, so I can't comment on this one, but other DB systems generally have more conservative (read: "sane") defaults and make it harder for yourself to shoot yourself in the foot by configuring the DB to

Joachim Sauer 2009-10-27 16:28:10

+3 A:

Since it's tagged [subjective], I'll be subjective. For me it's about the little things. PostgreSQL is more developer friendly and makes it easy to do the right thing regarding data integrity by default.

If you give MySQL an incorrect type, it will implicitly convert it even if the conversion is incorrect. PostgreSQL will complain.
EXPLAIN in PostgeSQL is way more useful than in MySQL. It gives you the exact structured query plan. What kind of algorithm will it use, what cost does does each step have, etc. This means that if the query optimizer in MySQL doesn't do what you think it does, you will have hard time to debug it.
If you ever wrote anything more complex in the MySQL stored procedure language, you will know how painful it is. PL/pgSQL is actually a nice language + you can use many other languages.
MySQL doesn't have sequences, so if you need them you have to roll your own. Most people will do it wrong and have race conditions in their code.
PostgreSQL exposes most of it's internal lock types to the developer. If you need to lock your table in a special way, you can do that.
Everything is programmable in PostgreSQL. For example, if you need your own data type for some specific data, you can add it. You can add casts and operators for the data types. Probably not worth the effort for small projects, but it's better than storing things as strings.
PostgreSQL adds every action including DDL changes to a transaction, unlike MySQL. If you have a conversion script that creates/drops tables, BEGIN/END won't help you in MySQL to keep it in consistent state.

That doesn't mean it's impossible to write good database applications with MySQL, it just requires more effort.

Lukáš Lalinský 2009-10-27 13:57:43

+2 A:

Provided you know what its capabilities are, then it may fit your use case.

If used correctly, then it is ACID compliant. If used incorrectly, it is not. The trouble is, that people seem to assume that it's a good thing to have ACID compliance.

In reality ACID is often the enemy of performance (Particularly the D for durability). By relaxing durability very slightly, we can typically get a very large performance boost.

Likewise, even using the MyISAM engine (which doesn't have much by way of durability, and not a lot of the others either) is still appropriate to some problem domains.

MarkR 2009-10-27 13:58:48

The main problem of MyISAM is that it gets the Atomicity wrong and breakage in this area can lead to very subtle bugs.

Joachim Sauer 2009-10-27 14:20:29

+1 A:

I haven't used both for a single large project, but having used both I have some idea of how they compare.

In general almost all MySQL's problems can be worked around with good discipline. The issue is more that developer has to know all the gotchas and work around them. After working with PostgreSQL or Oracle this feels a bit like death by a thousand papercuts. You get that used to stuff just working.

This is a pretty significant issue in the types of stuff that I have worked on. Complex schemas with complex queries and lots of data. tight schedules with little time for performance engineering meaning that getting consistently reasonable performance without having to manually optimize queries is important. A good cost based optimizer is almost a requirement. Combine that with quite a lot of outsourcing with development teams that don't have the experience to catch all the gotchas in time and the little issues escalate to large QA problems. Hitting any of MySQL silent data corruption gotchas in production is something that really scares me. I'll take any declarative constraints at the database level that I can get to have atleast some safety net, MySQL unfortunately falls short on that.

PostgreSQL has the added benefit that it can run significantly more algorithms using more advanced data-structures in the database. Most of our large projects have a few cases where MySQL will hit its limits. Moving the algorithms outside the database requires considerably more effort with pretty tricky code involving correct locking and synchronization. In particular I have at one time or another hit the need for partial indexes, indexes on expressions, custom aggregate functions, set returning stored procedures, array and hash datatypes, inverted indexes on array values, update/delete-returning, deferrable foreign key constraints.

On the other hand MySQL has at least for now a better story for scale out. If I had to support a huge number users on a reasonably simple application, and had the team to build a heavily partitioned and replicated database with eventual consistency, I'd pick MySQL over PostgreSQL for the low level data storage building block. On the other the competitors in that space are the key-value databases.

Ants Aasma 2009-10-27 21:55:06

+1 A:

are these "problems" with MySQL really that bad?

Actually, the pain MySQL will inflict on you can range from moderate to insane, and much of it depends on MyISAM.

I find a good rule of thumb is this :

are you backing up some MyISAM tables ?

MyISAM is great for data you don't really care about, like traffic logs and the like, or for data that you can easily restore in case of a problem since it's read-only and hence never changed since the time you loaded that 10GB dump. In those cases the compact row format of MyISAM brings great space savings (that however do not translate into faster seq scan speed, for some reason).

If the data you put in MyISAM tables is worth backing up, you are going to enter in a world of hurt when you realize some day that it is all inconsistent because of the lack of FK and constraint checks, and incidentally all your backups will contain inconsistent data too.

If you make lots of concurrent updates to MyISAM tables, then you are gonna go way past the world of hurt stage : when the load reaches a certain threshold, you are fucked. Of course the readers block writers which block readers which block queued writers, etc, so the performance is bad, load avg goes to 200, and your box is nuked, but also I could consistently crasy MyISAM tables in a benchmark I wrote 2 years ago just by hitting them with too much load. Random data ensued, sometimes crashing the mysql on selects or spewing random errors.

So, if you avoid MyISAM like the plague it is, the problems with MySQL aren't really that bad. InnoDB is robust. However, generally I find it inferior to Postgres, which is faster and has so many less gotchas, and Gets The Job Done easier and faster.

peufeu 2009-10-27 22:56:23

ansaurus

tags:

views:

answers:

MySQL: are the consistency/data loss/query optimization issues I read about "that bad"?

related questions