tags:

views:

1321

answers:

10

I am rather confused by the hurt-mongering here.

I know how to do them, see below, but no idea why? What are they for?

create table orders (order_no int not null auto_increment, FK_cust_no int not null, 
foreign key(FK_cust_no) references customer(cust_no), primary key(order_no)) type=InnoDB;


create table orders (order_no int not null auto_increment, FK_cust_no int not null, 
foreign key(FK_cust_no) references customer(cust_no), primary key(order_no));
+9  A: 

Why use InnoDB?

InnoDB is commonly viewed as anything but performant, especially when compared to MyISAM. Many actually call it slow. This view is mostly supported by old facts and mis-information. In reality, you would be very hard-pressed to find a current, production-quality MySQL Database Engine with the CPU efficiency of InnoDB. It has its performance "quirks" and there are definitely workloads for which it is not optimal, but for standard OLTP (Online Transaction Processing) loads, it is tough to find a better, safer fit.

  • Performance The performance claims for InnoDB are not idle, there are numbers to back it up and benchmarks run by some of the most well-respected MySQL consulting firms in existence. For example, the people behind mysqlperformanceblog.com, a leading MySQL tuning and optimization site run by Percona[1], ran a benchmark comparing Falcon, MyISAM and InnoDB. The benchmark was really supposed to be highlighting Falcon, except it was InnoDB that won the day, topping both Falcon and MyISAM in queries per second for almost every test: http://www.mysqlperformanceblog.com/2007/01/08/innodb-vs-myisam-vs-falco...

    There are interesting trends in these graphs. InnoDB uses "clustered" indexes. This means that the data for the table is actually stored in the "leaves" of the primary key index, thus there is no need to fetch the row information separately. This is also one of the reasons for InnoDB's reputation of having a large memory footprint. When you are using MyISAM, the key_buffer is loaded with just the index of the table. In contrast, when InnoDB buffers its index it is implicitly buffering the data in the table as well, as the innodb_buffer_pool buffers on the memory page level.[8] As you might imagine, this can lead to excellent performance and very large memory requirements.

    This technique of clustered primary keys is one reason for the fairly large margin in the benchmarks between MyISAM and InnoDB when the query in question uses the primary key for a range. What is somewhat surprising, though, is that InnoDB still wins in the tests that use a secondary index. This is comforting as it shows that the performance of the engine is not just based on key clustering, a performance boost that is very query dependent.

    InnoDB uses some other tricks with indexes as well. It can build "adaptive hash indexes" for frequent queries and does so when an entire table comes close to fitting in memory.[5] These hash indexes are quite a bit faster than the standard BTree index (when the table is in memory). Again, this is another significant performance improvement at the expense of memory usage.

    Returning to the benchmark above, we can also see that InnoDB still has some scaling issues when you get to high thread numbers (caused by many concurrent connections or requests). This is a known issue and while it improves with most major releases, it is good to keep in mind. Even with this issue, InnoDB is often used explicitly to enable concurrency.

  • Concurrency The irony is that while InnoDB has some mutex locking issues at high levels of concurrency, one tends to use it to enable concurrency at all. A major issue with MyISAM is the lack of row-level locking. This means that Drupal uses explicit table locks on some critical paths and that tables are implicitly locked on many non-critical paths, those that involve updates or inserts into the middle of a non-optimized table (which encompasses most inserts and updates). Row-level locking means these implicit locks usually don't happen. Drupal itself can be patched to remove many of the table locks from critical paths as they are no longer required. Locking sanity is now ensured at the engine level and this allows for many more concurrent operations, so many more in fact that you may have to limit the number of threads entering the InnoDB kernel to prevent the database server from thrashing. This is a much better problem to have than an artificial performance limit from table locks, but is a problem nonetheless.

    A good example of run-away concurrency occurred with the drupal.org database server in early 2007. We had just converted to InnoDB and applied patches to Drupal to remove locking on the critical paths. At that time, we were using a version of MySQL that by default had unlimited InnoDB kernel threads. It was roughly an hour before we hit peak usage and the server ground to a halt. Looking at the MySQL process listing and vmstat output, it was easy to see what was going on. We had so many concurrent threads that switching between them (and the overhead inherent in that) was preventing any of them from moving forward at a reasonable pace. This is what you have to watch out for and why the InnoDB kernel thread limits are useful.

  • Reliability Any database administrator is assured to have the same nightmare, calling a client to tell them that their data is corrupted or just flat-out gone. MyISAM doesn't help these concerns as it ensures almost no data integrity. Hardware failures, unclean shutdowns and canceled operations are just a few of the events that can lead to MyISAM corruption. There are excellent tools to recover from this, but they are not guaranteed to work and their use requires sometimes extensive downtime for table checks/repairs.

    On the other hand, InnoDB is a largely ACID (Atomicity, Consistency, Isolation, Durability) engine, built to guarantee consistency and durability. It does this through a transaction log (with the option of a two-phase commit if you have the binary log enabled), a double-write buffer and automatic checksumming and checksum validation of database pages. These safety measures not only prevent corruption on "hard" shutdowns, but can even detect hardware failure (such as memory failure/corruption) and prevent damage to your data.

    Drupal.org has made use of this feature of InnoDB as well. The database in question contains a large amount of user contributed content, cvs messages, cvs history, forum messages, comments and, more critically, the issue queues for the entire Drupal project. This is not data where corruption is an option. In 2007, the master database server for the project went down. After examining the logs, it became clear that it hadn't crashed as such, but InnoDB had read a checksum from disk that didn't match the checksum it had in memory. In this case, the checksum miss-match was a clear sign of memory corruption. Not only did it detect this, but it killed the MySQL daemon to prevent data corruption. In fact, it wouldn't let the MySQL daemon run for more than a half hour on that server without killing it after finding a checksum miss-match. When your data is of the utmost importance, this is very comforting behavior.

  • Data Security On a related topic, the transactional nature of InnoDB enables easy and online backups. A major issue with MyISAM is that any backup strategy designed to pull guaranteed, consistent backups will require table locks and any strategy that involves point-in-time recovery from binary logs will require a full database lock. This is totally unacceptable for a large, production-grade website. The only real way around this is to have a slave database server and pull backups from that machine.

    InnoDB, on the other hand, can run a backup job in a single transaction and pull consistent, database-wide backups with only a short lock at the beginning of the job. The ease of pulling these backups quickly becomes addicting and makes it much easier to follow safe backup procedures.

Why use MyISAM?

  • Simplicity So far, this has read somewhat like a paid advertisement for InnoDB. However, MyISAM has some very real advantages. One of these is the simplicity of the engine, it is very well understood and it's easy to write third-party tools to interact with it. There are very high-quality free tools, such as mysqlhotcopy available for MyISAM. It is much more difficult to write tools for an engine as complicated as InnoDB and this can be easily seen in the number of them available. Also, this simplicity allows for an ease of administration that is not there with InnoDB.

  • Optimization MyISAM's other main advantage is how long it has been around. There are many systems, Drupal for example, that are very much optimized for that particular engine. This is not to say that they perform poorly on InnoDB, but they are not optimized for it. For example, while many of Drupal's core queries are well indexed and use the primary key (thus benefiting from InnoDB's primary key clustering), some could be improved. The node table has a primary key on (nid,vid). Having this index is a good idea, but it is a two-integer index and there are eleven secondary indexes based on it. This doesn't mean much when you use MyISAM, but under InnoDB it means each of those secondary indexes has two-integer sized leaves identifying the primary key.

    Another fact, is that there are some workloads MyISAM is better suited for. For example, Drupal's built-in search functionality performs horribly on InnoDB for very large data-sets, for example 100k+ rows. These tables are best left MyISAM. Fortunately, MySQL allows for mixing engines like this.

  • Resource Usage It is readily accepted in computer science that there is often a trade-off between speed and memory footprint. We have seen through the above benchmarks that InnoDB has some fast algorithms, however, this comes at a price. Not only does InnoDB use more memory than MyISAM, but the actual data files are often quite a bit larger. Add to this the fact that InnoDB has at least one quite large log file and you have a significant increase in resource use. This makes MyISAM a good fit for a resource-limited server. However, if you're concerned at all with high levels of concurrency, it is likely you have the funds to buy a server that can handle these increased resource demands.

State Of MySQL Engines With Drupal

In summary, many of the historical concerns and rumors of InnoDB slowness are simply false. In most cases, InnoDB is the correct choice for a Drupal site. It provides increased concurrency, enhanced performance and much more data integrity than MyISAM ever can. However, the pluggable nature of MySQL engines allows the user to "mix and match" table engines inside a single database. This allows us to consider tables whose workloads fit MyISAM more so than InnoDB. The main candidates for MyISAM in a mostly InnoDB-centric database are the search tables. A database layout with most tables being InnoDB (for performance and data security) and the search tables being MyISAM (for performance on that particular workload and acknowledging the fact that, if that data was corrupted, it could be easily rebuilt) is an excellent fit.

source

Adinochestva
Probably better to just put the link here instead of copying the article verbatim.
Jim Ferrans
Wow, you should make this a blog post. :) +1
the_drow
the_drow: I think it was actually copy/pasted from a blog post.
Masi
nice copy+paste
shylent
+3  A: 

Everywhere! Deprecate myisam, innodb is the way to go. Is not only about performance, but data integrity and acid transactions.

knoopx
+2  A: 

In your example, you create foreign keys. Foreign keys are only supported for InnoDB tables, not for MyISAM tables.

Zr40
+7  A: 

InnoDB is a storage engine in MySQL. There are quite a few of them, and they all have their pros and cons. InnoDB's greatest strengths are:

  • Support for transactions (giving you support for the ACID property).
  • Row-level locking. Having a more fine grained locking-mechanism gives you higher concurrency compared to, for instance, MyISAM.
  • Foreign key constraints. Allowing you to let the database ensure the integrity of the state of the database, and the relationships between tables.
PatrikAkerstrand
+2  A: 

You may be interested in this article from Database Journal which discusses the InnoDB table type in MySQL.

Excerpt:

Last month we looked at the HEAP table type, a table type which runs entirely in memory. This month we look at setting up the InnoDB table type, the type of most interest to serious users. The standard MyISAM table type is ideal for website use, where there are many reads in comparison to writes, and no transactions. Where these conditions do not apply (and besides websites, they do not apply often in the database world), the InnoDB table is likely to be the table type of choice. This article is aimed at users who are familiar with MySQL, but have only used the default MyISAM table type.

I wouldn't be put off by the other question. Keep proper backups of your database, of any type -- and don't drop tables by accident ;-) -- and you'll be ok whatever table type you choose.

tvanfosson
+1 for the article and backups
Masi
+1  A: 

In general for me the most important point is that InnoDB offers per row locking, while MyISAM does look per table. On big tables with a lot of writes this might make a big performance issue.

On the otherhand MyISAM table have a easier file structure, copying and repairing table on file level is way easier.

+1 for personal perspective
Masi
+4  A: 

Always. Unless you need to use MySQL's full-text search or InnoDB is disabled in your shared webhost.

Imran
that's the most simple answer and it's true!
JohnB
A: 

A supplement to Machine and knoopx's answer about transactions:

The default MySQL table type, MyISAM, does not support transactions. BerkeleyDB and InnoDB are the transaction-safe table types available in open source MySQL, version 3.23.34 and greater.

The definition of the transaction and an banking example

A transaction is a sequence of individual database operations that are grouped together. -- A good example where transactions are useful is in banking.

Source of the citations

Masi
+1  A: 

A comment has a command to convert your databases to InnoDB here.

Masi
+1  A: 

I think you are confused about two different issues, when to use InnoDB instead of MyISAM, and when to use foreign key (FK) constraints.

As for the first issue, there have been many answers that do a great job of explaining the differences between MyISAM and InnoDB. I'll just reiterate that, in tvanfosson's quote from an article, MyISAM is better suited for system with mostly reads. This is because it uses table level locking instead of row level like InnoDB, so MyISAM can't handle high concurrency as well, plus it's missing features that help with data integrity such as transactions and foreign keys (again, already mentioned by others).

You don't have to use FK constraints in your data model. If you know what the relationships between your tables is, and your application is free of bugs, then you'll get by without FKs just fine. However, using FKs gives you extra insurance at the database layer because then MySQL won't let your application insert bad data based on the constraints that you created.

In case you aren't clear on why to use primary keys (PK), making a column such as id_order for example the PK of the orders table means that MySQL won't let you INSERT the same value of id_order more than once because every row in a PK column must be unique.

A FK would be used on a table that has a dependency on another table, for example, order_items would have a dependency on orders (below). id_order_items is the PK of order_items and you could make id_order_items the FK of the orders table to establish a one-to-many relationship between orders and order_items. Likewise, id_item could be a FK in the order_items table and a PK in the items table to establish a one-to-many relationship between order_items and items.

**Then, what the FK constraint does is prevent you from adding an id_item value to theorder_itemstable that isn't in theitemstable, or from adding aid_order_itemstoordersthat isn't in theorder_items `table.

All a FK does is insure data integrity, and it also helps convey relationships among your tables to other developers that didn't write the system (and yourself months later when you forget!), but mainly it's for data integrity.**

Extra credit: so why use transactions? Well you already mentioned a quote that says that they are useful for banking system, but they are useful in way more situations than that.

Basically, in a relational database, especially if it's normalized, a routine operation such as adding an order, updating an order, or deleting an order often touches more than 1 table and/or involves more than one SQL statement. You could even end up touching the same table multiple times (as the example below does). Btw Data Manipulation Language (DML) statements (INSERT/UPDATE/DELETE) only involve one table at a time.

An example of adding an order:

I recommend an orders table and an order_items table. This makes it so can you can have a PK on the id_order in the orders table, which means id_order cannot be repeated in orders. Without the 1-to-many orders - order_items relationship, you'd have to have multiple rows in the orders table for every order that had multiple items associated with it (you need an items table too for this e-commerce system btw). This example is going to add an order and touch 2 tables in doing so with 4 different INSERT statements.

(no key constraints for illustration purposes)

-- insert #1
INSERT INTO orders (id_order, id_order_items, id_customer)
VALUES (100, 150, 1)

-- insert #2
INSERT INTO order_items (id_order_items, id_item)
VALUES (4, 1)

-- insert #3
INSERT INTO order_items (id_order_items, id_item)
VALUES (4, 2)

-- insert #4
INSERT INTO order_items (id_order_items, id_item)
VALUES (4, 3)

So what if the insert #1 and insert #2 queries run successfully, but the insert #3 statement did not? You'd end up with an order that was missing an item, and that would be garbage data. If in that case, you want to rollback all the queries so the database is in the same state it was before adding the order and then start over, well that's exactly what transactions are for. **You group together queries that you either want all of them done, or in case of an exception, then none at all, into a transaction.

So like PK/FK constraints, transactions help insure data integrity.**

JohnB