views:

1102

answers:

5

I have a MySQL database with one particular MyISAM table of above 4 million rows. I update this table about once a week with about 2000 new rows. After updating, I then perform the following statement:

ALTER TABLE x ORDER BY PK DESC

i.e. I order the table in question by the primary key field in descending order. This has not given me any problems on my development machine (Windows with 3GB memory), but, even though 3 times I have tried it successfully on the production Linux server (with 512MB RAM - and achieving the resulted sorted table in about 6 minutes each time), the last time I tried it I had to stop the query after about 30 minutes and rebuild the database from a backup.

I have started to wonder whether a 512MB server can cope with that statement (on such a large table) as I have read that a temporary table is created to perform the ALTER TABLE command?! And, if it can be safely run, what should be the expected time for the alteration of the table?

Thanks in advance, Tim

A: 

I'd probably create a View instead which is ordered by the PK value, so that for one thing you don't need to lock up that huge table while the ALTER is being performed.

Darth Continent
Thanks for the reply...The thing is I don't mind locking the table during the update as it will be offline anyway...
Timothy Mifsud
I don't believe a view will help here. MySQL has [two strategies][1] for view resolutions: `MERGE` and `TEMPTABLE`. When using merge, you would not gain any benefit as it's definition is simply merged with the submitted `SELECT` statement. `TEMPTABLE`, as the name suggests, will create a temporary table. But the way it looks, creating the temporary table is the cause of the original problem. So you would not gain anything except making maintenance more difficult.[1]: http://dev.mysql.com/doc/refman/5.0/en/view-algorithms.html
exhuma
+1  A: 

As I have just read, the ALTER TABLE ... ORDER BY ... query is useful to improve performance in certain scenarios. I am surprised that the PK Index does not help with this. But, from the MySQL docs, it seems that InnoDB does use the index. However InnoDB tends to be slower as MyISAM. That said, with InnoDB you wouldn't need to re-order the table but you would lose the blazing speed of MyISAM. It still may be worth a shot.

The way you explain the problems, it seems that there is too much data loaded into memory (maybe there is even swapping going on?). You could easily check that with monitoring your memory usage. It's hard to say as I do not know MySQL all that well.

On the other hand, I think your problem lies at a very different place: You are using a machine with only 512 Megs of RAM as Database server with a table containing more than 4Mio rows... And you are performing a very memory-heavy operation on the whole table on that machine. It seems that 512Megs will not nearly be enough for that.

A much more fundamental issue I am seeing here: You are doing development (and quite likely testing as well) in an environment that is very different to the production environment. The kind of problem you are explaining is to be expected. Your development machine has six times as much memory as your production machine. I believe I can safely say, that the processor is much faster as well. In that case, I suggest you create a virtual machine mimicking your production site. That way you can easily test your project without disrupting the production site.

exhuma
Recent improvements to InnoDB have made it perform on par with MyISAM in most scenarios.
Bill Karwin
@Bill: Interesting. So with that, you could say that InnoDB really is the way to go? Same performance, more features. After seeing your profile, I think I can believe you. Still, do you have any proof to go with that?
exhuma
A: 

What you're asking it to do is rebuild the entire table and all its indexes; this is an expensive operation particularly if the data doesn't fit in ram. It will complete, but it will be vastly slower if the data doesn't fit in ram, particularly if you have lots of indexes.

I question your judgement when choosing to run a machine with such tiny memory in production. Anyway:

  • Is this ALTER TABLE really necessary; what specific query are you trying to speed up, and have you tried it without?
  • Have you considered making your development machine more like production? I mean, using a dev box with MORE memory is never a good idea, and using a different OS is definitely not either.

There is probably also some tuning you can do to try to help; it largely depends on your schema (indexes in particular). 4M rows is not very many (for a machine with normal amounts of ram).

MarkR
Hi Mark...thanks for your reply...The limit on memory is due to budgetary considerations...I thought that if the site caught on I would upgrade the specs of the server...However, the reason for doing the ALTER is that users can run a stored procedure which queries this table and I want to return the results in order of "latest inserted first". I can achieve this using an ORDER BY in the query itself, but unfortunately this seems to be very expensive and slows down the queries considerably...So, when I update the table, I usually preorder by PK desc to skip this ORDER BY.
Timothy Mifsud
You should create an appropriate index so that the ORDER BY query does not need to sort. You can check this using EXPLAIN (Only if the query is not in a SP). ALTER TABLE ... ORDER BY is not a solution, as it doesn't guaranteed the data stays ordered.
MarkR
Hi Mark. I have 8 indexes on this table. If I were to add the PK field (which I want to order by desc) to the rightmost part of each of these indexes, the indexes will still be used to satisfy the WHERE clause and, even though the ordering field will not be a leftmost prefix of the index (as I will be adding it to the rightmost of each index), it can still be used for the ORDER BY? Thanks.
Timothy Mifsud
Unfortunately I have added the ORDER BY field to an index, but EXPLAIN still shows that it is using filesort...I do not think I can include the ORDER BY field in an index such that the WHERE clause will also use an index...I also have tried "SELECT * FROM (SELECT...main query) ORDER BY..." but it still takes much longer than without the ORDER BY clause...
Timothy Mifsud
You'd need an index with both columns in, in the right order. Post another question containing your schema, query and explain plan.
MarkR
A: 

If you're using InnoDB, you shouldn't have to explicitly perform the ORDER BY either post-insert or at query time. According to the MySQL 5.0 manual, InnoDB already defaults to primary key ordering for query results:

http://dev.mysql.com/doc/refman/5.0/en/alter-table.html#id4052480

MyISAM tables return records in insertion order by default, instead, which may work as well if you only ever append to the table, rather than using an UPDATE query to modify any rows in-place.

rcoder
A: 

is the primary key auto_increment? if so, then doing ALTER TABLE ... ORDER BY isn't going to improve anything since everything will be inserted in order anyway.

(unless you have lots of deletes)

longneck
Thanks for the reply. However, the problem is that I want to give the results in reverse order of the primary key order...
Timothy Mifsud
then you need to be optimizing your stored procedures, queries, and server settings, not trying black-magic things like ALTER TABLE, which only works because of a quirk in myisam tables. if the performance of your stored procedures and queries suffer when you sort them then you should open a new question and post the CREATE TABLE statement, the query/procedure and the EXPLAIN output. then we can help you optimize the query or your server configuration.
longneck