views:

3225

answers:

11

When doing an ALTER TABLE statement in MySQL, the whole table is read-locked for the duration of the statement. If it's a big table, that means insert or update statements could be locked for a looooong time. Is there a way to do a "hot alter", like adding a column in such a way that the table is still updatable throughout the process?

Mostly I'm interested in a solution for MySQL but I'd be interested in other RDBMS if MySQL can't do it.

To clarify, my purpose is simply to avoid downtime when a new feature that requires an extra table column is pushed to production. Any database schema will change over time, that's just a fact of life. I don't see why we should accept that these changes must inevitably result in downtime; that's just weak.

+2  A: 

Nope. If you are using MyISAM tables, to my best understanding they only do table locks - there are no record locks, they just try to keep everything hyperfast through simplicity. (Other MySQL tables operate differently.) In any case, you can copy the table to another table, alter it, and then switch them, updating for differences.

This is such a massive alteration that I doubt any DBMS would support it. It's considered a benefit to be able to do it with data in the table in the first place.

le dorfier
InnoDB uses row locks - http://dev.mysql.com/doc/refman/5.0/en/internal-locking.html
Eran Galperin
Yeah, MySQL is the aberration. That's why I was specific about "standard" tables.
le dorfier
You wrote - standard MySQL tables only do table locks - which is incorrect.
Eran Galperin
How do you interpret this about MyISAM (i.e. MySQL standard) tables from the page you quoted? "MySQL uses table-level locking for MyISAM and MEMORY tables, page-level locking for BDB tables, and row-level locking for InnoDB tables."
le dorfier
some storage engines use row level locking, and some use table level locking. There is no standard storage engine (maybe you meant the default in phpMyAdmin...)
Eran Galperin
So you're new to MySQL? MyISAM is what's installed unless you change it. It's by far the most widely used. What definition would you associate with "Standard"?
le dorfier
All the mentioned storage engines are installed by default.. You determine the storage engine with the table creation query.I've been using MySQL for over 7 years to answer your question.
Eran Galperin
OK. I'll concede whatever your point is - your experience is different and perhaps more legitimate than mine. What would you like me to change to correct my answer?
le dorfier
The sentence about the locks is completely incorrect, you should drop it
Eran Galperin
I don't think it's completely inaccurate; I've clarified it. Perhaps we can at least agree to respectfully differ. How do you read this re MyISAM; "The binary portable storage engine that is the default storage engine used by MySQL." http://dev.mysql.com/doc/refman/5.1/en/create-table.html
le dorfier
By saying standard you implied more than the default storage engine - rather that it was "standard". I use InnoDB as the standard storage engine in my applications, and so do most production MySQL databases that require transactions. Removed my downvote since you cleared it up.
Eran Galperin
@Eran, I'm reminded how critical it is for database architects and developers to be precise and unambiguous. Thanks for your help.
le dorfier
+11  A: 

The altering of a table is change in the architecture of your application. It should not be a normal part of your production process and you should not expect your application to be usable whilst this is taking place.

If your application is designed in such a way that tables need to have their schemas altered as part of the normal running of the application then I'm afraid there is something very badly wrong with your design.

AdamRalph
Yeah. Damn right!! :-) Seriously, schema changes should be done infrequently, in the wee hours of the morning by operators that are wired to the eyeballs on caffeine, not in the middle of the day, six times a week.
paxdiablo
Right. Make that Sunday morning. Before a Monday Holiday.
le dorfier
That's bogus. We can build computers where you can take a hammer to a hard drive, but can't handle a tiny little 'alter table add column' without downtime?
SquareCog
You should not expect your application to be usable whilst this is taking place???Talk about low expectations...
Daniel
@Dmitry - there are ways and means, having two DBs and switching between them with a config change will work for some changes for example. However, if you're changing the model, then you should be changing the rest of the app to implement that, so it's not just about "alter table"
annakata
Annakata -- so lets talk about the ways and means instead of pretending it's impossible to have 0 downtime, especially for something this trivial.
SquareCog
Hey, @Dmitry, let me have a go at your hard drive with a hammer - I bet I can force some downtime on it :-).
paxdiablo
http://www.youtube.com/watch?v=MertszOktiw :-P
SquareCog
A: 

Not really.

You ARE altering the underlying structure of the table, after all, and that's a bit of information that's quite important to the underlying system. You're also (likely) moving much of the data around on disk.

If you plan on doing this a lot, you're better off simply padding the table with "dummy" columns that are available for future use.

Will Hartung
+7  A: 

The only other option is to do manually what many RDBMS systems do anyway...
- Create a new table

You can then copy the contents of the old table over a chink at a time. Whilst always being cautious of any INSERT/UPDATE/DELETE on the source table. (Could be managed by a trigger. Although this would cause a slow down, it's not a lock...)

Once finished, change the name of the source table, then change the name of the new table. Preferably in a transaction.

Once finished, recompile any stored procedures, etc that use that table. The execution plans will likely no longer be valid.

EDIT:

Some comments have been made about this limitation being a bit poor. So I thought I'd put a new perspective on it to show why it's how it is...

  • Adding a new field is like changing one field on every row.
  • Field Locks would be much harder than Row locks, never mind table locks.

  • You're actually changing the physical structure on the disk, every record moves.
  • This really is like an UPDATE on the Whole table, but with more impact...
Dems
And have a thorough test plan before swapping. If it fails, start over.
le dorfier
Column locks are not that big a deal for a columnar database :-). Also, if you are adding a column, depending on your physical layout this may not require any change to the table data at all -- see the Oracle answer below.
SquareCog
Managing the synchronization through triggers was a nice idea. I've been using MySQL for so long that I keep forgetting they have triggers now. I've used this technique and now I have a functional hot-alter script. With a progress bar. And it works with MyISAM. Life is good.
Daniel
+1 This is literally what SQL Enterprise manager does behind the scenes when you make certain kinds of table changes in the UI. In SQL 2008, they actually added a warning so that the user KNOWS its performing this drastic action.
BradC
A: 

Dummy columns are a good idea if you can predict their type (and make them nullable). Check how your storage engine handles nulls.

MyISAM will lock everything if you even mention a table name in passing, on the phone, at the airport. It just does that...

That being said, locks aren't really that big a deal; as long as you are not trying to add a default value for the new column to every row, but let it sit as null, and your storage engine is smart enough not to go writing it, you should be ok with a lock that is only held long enough to update the metadata. If you do try to write a new value, well, you are toast.

SquareCog
I tried adding a NULL column to an InnoDB table and it had to rebuild the entire table; not a simple "update the metadata" operation.
Daniel
+2  A: 

Since you asked about other databases, here's some information about Oracle.

Adding a NULL column to an Oracle table is a very quick operation as it only updates the data dictionary. This holds an exclusive lock on the table for a very short period of time. It will however, invalidate any depedant stored procedures, views, triggers, etc. These will get recompiled automatically.

From there if necessary you can create index using the ONLINE clause. Again, only very short data dictionary locks. It'll read the whole table looking for things to index, but does not block anyone while doing this.

If you need to add a foreign key, you can do this and get Oracle to trust you that the data is correct. Otherwise it needs to read the whole table and validate all the values which can be slow (create your index first).

If you need to put a default or calculated value into every row of the new column, you'll need to run a massive update or perhaps a little utility program that populates the new data. This can be slow, especially if the rows get alot bigger and no longer fit in their blocks. Locking can be managed during this process. Since the old versino of your application, which is still running, does not know about this column you might need a sneaky trigger or to specify a default.

From there, you can do a switcharoo on your application servers to the new version of the code and it'll keep running. Drop your sneaky trigger.

Alternatively, you can use DBMS_REDEFINITION which is a black box designed to do this sort of thing.

All this is so much bother to test, etc that we just have an early Sunday morning outage whenever we release a major version.

WW
+1  A: 

Using the Innodb plugin, ALTER TABLE statements which only add or drop secondary indexes can be done "quickly", i.e. without rebuilding the table.

Generally speaking however, in MySQL, any ALTER TABLE involves rebuilding the entire table which can take a very long time (i.e. if the table has a useful amount of data in it).

You really need to design your application so that ALTER TABLE statements do not need to be done regularly; you certainly don't want any ALTER TABLE done during normal running of the application unless you're prepared to wait or you're altering tiny tables.

MarkR
+1  A: 

In general, the answer is going to be "No. You're changing the structure of the table which potentially will require a lot of updates" and I definitely agree with that. If you expect to be doing this often, then I'll offer an alternative to "dummy" columns - use VIEWs instead of tables for SELECTing data. IIRC, changing the definition of a view is relatively lightweight and the indirection through a view is done when the query plan is compiled. The expense is that you would have to add the column to a new table and make the view JOIN in the column.

Of course this only works if you can use foreign keys to perform cascading of deletes and whatnot. The other bonus is that you can create a new table containing a combination of the data and point the view to it without disturbing client usage.

Just a thought.

D.Shawley
A: 

I would recommend one of two approaches:

  1. Design your database tables with the potential changes in mind. For example, I've worked with Content Management Systems, which change data fields in content regularly. Instead of building the physical database structure to match the initial CMS field requirements, it is much better to build in a flexible structure. In this case, using a blob text field (varchar(max) for example) to hold flexible XML data. This makes structural changes very less frequent. Structural changes can be costly, so there is a benefit to cost here as well.

  2. Have system maintenance time. Either the system goes offline during changes (monthly, etc), and the changes are scheduled during the least heavily trafficked time of the day (3-5am, for example). The changes are staged prior to production rollout, so you will have a good fixed window estimate of downtime.

2a. Have redundant servers, so that when the system has downtime, the whole site does not go down. This would allow you to "roll" your updates out in a staggered fashion, without taking the whole site down.

Options 2 and 2a may not be feasible; they tend to be only for larger sites/operations. They are valid options, however, and I have personally used all of the options presented here.

pearcewg
+1  A: 

I like the table copy+triggers approach described by Dems. It seems that that's what ChronicDB does.

Gary
A: 

If you cannot afford downtime for your database when doing application updates you should consider maintaining a two-node cluster for high availability. With a simple replication setup, you could do almost fully online structural changes like the one you suggest:

  • wait for all changes to be replicated on a passive slave
  • change the passive slave to be the active master
  • do the structural changes to the old master
  • replicate changes back from the new master to the old master
  • do the master swapping again and the new app deployment simultaneously

It is not always easy but it works, usually with 0 downtime! The second node does not have to be only passive one, it can be used for testing, doing statistics or as a fallback node. If you do not have infrastructure replication can be set up within a single machine (with two instances of MySQL).

jynus