What is Locking in MySQL (or any RDBMS) and when would you use it? A Layman explanation with a Example would be great!
Thank you in advance;-)
What is Locking in MySQL (or any RDBMS) and when would you use it? A Layman explanation with a Example would be great!
Thank you in advance;-)
A few days ago I answered a question on SO and gave an example which demonstrates a situation where locking allows multiple users to concurrently insert rows in a table with an incrementing id
, without using AUTO_INCREMENT
.
Consider the following schema as an example:
CREATE TABLE demo_table (id int) ENGINE=INNODB;
-- // Add few rows
INSERT INTO demo_table VALUES (1), (2), (3);
Then we can do the following:
START TRANSACTION;
-- // Get the MAX(id) so that we increment it by one
SELECT @x := MAX(id) FROM your_table FOR UPDATE;
+---------------+
| @x := MAX(id) |
+---------------+
| 3 |
+---------------+
1 row in set (0.00 sec)
The FOR UPDATE
syntax is what actually puts a lock on the rows read by this query.
Without committing the transaction, we start another separate session (simulating a concurrent user), and do the same:
START TRANSACTION;
-- // Get the MAX(id) as well
SELECT MAX(id) FROM demo_table FOR UPDATE;
The database will wait until the lock set in the previous session is released before running this query.
Therefore switching to the previous session, we can insert the new row and commit the transaction:
-- // Insert a new row with id = MAX(id) + 1
INSERT INTO demo_table VALUES (@x + 1);
COMMIT;
After the first session commits the transaction, the lock will be lifted, and the query in the second session is returned:
+---------+
| MAX(id) |
+---------+
| 4 |
+---------+
1 row in set (8.19 sec)
Note that without locking, the second session would have returned immediately, but with 3
as MAX(id)
instead of 4
. If both sessions were to insert a row with an id
of MAX(id) + 1
, both would insert id = 4
. You can simulate the same test without the FOR UPDATE
bit to see how this is handled without locks.
We have a joint bank account with a balance of $200
I go to the ATM and put my card into the machine, the machine checks that I have a balance of $200
Meanwhile, you go into the bank and ask for $50, the teller brings up your account and confirms that you have the money.
I request a withdrawal of $200, the machine counts my money gives me $200 and sets my balance at $0
The teller counts your money and gives you the $50, the system then updates the balance on the account as $150 ($200 - $50 withdrawl).
So now we have $250 cash and $150 left in the account. $200 profit.
The database should have used locks to prevent both transactions occuring at the same time.
The problem is if you handle every transaction in that way then we would lose concurrency and performance would suffer, so there are different transaction isolation levels
that are used depending on the scenario, for instance you might not care that someone can modify data that has been read in a transaction.
http://en.wikipedia.org/wiki/Isolation_%28database_systems%29
You should learn these and understand the scenarios where they are applicable.
In addition to Daniel's comment, I want to add that locking can be crucial to avoid two users to modify data at the same time. You may think that's unlikely, but depending on the application, it is a significant risk if there if the same data is frequently changed by different users.
Imagine the following situation without using locks: John opens his screen (he doesn't know he's using a database, he is only an end user who is looking at a pretty screen), modifies so data, and then hits "Save". Let's say John open the screen at 9:30 and then saves the data at 9:32.
However, Mary opened exactly the same screen and the same record at 9:29. She probably saw the same data that John did at 9:30. Then, she updates the record, and hits "Save" at 9:31.
What data was saved? John's or Mary's?
Mary happily keeps working on other records, and when she comes back later to open the record again, she sees that her changes were lost, and she sees John's changes instead!!
Locking has to be used wisely, because it has issues too long to discuss here: for example, let's say that you lock the record every time somebody opens the record to edit it? What happens if John locks the record, and leaves his session screen open or looses his connection? The lock can remain there for hours, prohibiting everybody else from changing (or even looking) at that record. Also, locking may affect performance under some situations.
Understanding locking is crucial to maintain happy users and data integrity. Please look at the documentation.