views:

215

answers:

5

Hi,

Background: I'm working on a system where the developers seem to be using a function which executes a MYSQL query like "SELECT MAX(id) AS id FROM TABLE" whenever they need to get the id of the LAST inserted row (the table having an auto_increment column).

I know this is a horrible practice (because concurrent requests will mess the records), and I'm trying to communicate that to the non-tech / management team, to which their response is...

"Oh okay, we'll only face this problem when we have 
(a) a lot of users, or 
(b) it'll only happen when two people try doing something
    at _exactly_ the same time"

I don't disagree with either point, and think we'll run into this problem much sooner than we plan. However, I'm trying to calculate (or figure a mechanism) to calculate how many users should be using the system before we start seeing messed up links.

Any mathematical insights into that? Again, I KNOW its a horrible practice, I just want to understand the variables in this situation...


Update: Thanks for the comments folks - we're moving in the right direction and getting the code fixed!

A: 

Instead of using SELECT MAX(id) you shoud do as the documentation says :

Instead, use the internal MySQL SQL function LAST_INSERT_ID() in an SQL query

Even so, neither SELECT MAX(id) nor mysql_insert_id() are "thread-safe" and you still could have race condition. The best option you have is to lock tables before and after your requests. Or even better use transactions.

Colin Hebert
you guys told nothing new to the OP
Col. Shrapnel
Yes, that mysql_insert_id() isn't a perfect solution either.
Colin Hebert
oh. who said you that?
Col. Shrapnel
Hi Colin - thanks, I'm not sure about "mysql_insert_id() vs MySQL's LAST_INSERT_ID()" but we can probably keep that for a separate debate...
DrMHC
mysql_insert_id() is thread-safe, and will return the last inserted ID for the current connection. See http://dev.mysql.com/doc/refman/5.0/en/mysql-insert-id.html
Jon Benedicto
My bad, I forgot we should presume that, every HTTP request create a new database connection. But still mysql_insert_id() isn't thread safe. In this case you have a connection by request but you could have a connection shared amongst users.
Colin Hebert
@Colin HEBERT, a single connection can't execute 2 queries simultaneously. So you need to create a new connection for each user anyway.
Jon Benedicto
A: 

I don't have the math for it, but I would point out that response (a) is a little silly. Doesn't the company want a lot of users? Isn't that a goal? That response implies that they'd rather solve the problem twice, possibly at great expense the second time, instead of solve it once correctly the first time.

David
Yes, absolutely... but reworking the code is a time/money expense and they want to know when will things get reaaaallllyyyy bad and handle it just-in-time then! :)
DrMHC
@DrMHC, frankly speaking, a shitty code must be rewritten anyway.
Col. Shrapnel
Absolutely _not_ disagreeing with you - just trying to figure out a way of calculating when a conflict would happen... just the mathematical basis of it so it's clearer to all...
DrMHC
Is this a customer-facing product? Does money depend on it? If so, then the point at which it happens is exactly equal to the point at which customers get angry and money is lost. If one is to be avoided, then so must the other. Honestly, I sympathize with you. I've worked for companies where if the sales team didn't request a change, we couldn't make it. Then they'd complain that the software doesn't work right and customers are angry. Dark times.
David
A: 

This will happen when someone has added something to the table between one insert and that query running. So to answer your question, two people using the system has the potential for things to go wrong.

At least using the LAST_INSERT_ID() will get the last ID for a particular resource so it won't matter how many new entries have been added in between.

gabe3886
Thanks gabe - I'm trying to get to a "system" of figuring out *when* that conflict would happen...
DrMHC
A: 

In addition to the risk of getting the wrong ID value returned, there's also the additional database query overhead of SELECT MAX(id), and it's more PHP code to actually execute than a simple mysql_insert_id(). Why deliberately code something to be slow?

Mark Baker
That's too weak.
Col. Shrapnel
+2  A: 

The point is not if potential bad situations are likely. The point is if they are possible. As long as there's a non-trivial probability of the issue occurring, if it's known it should be avoided.

It's not like we're talking about changing a one line function call into a 5000 line monster to deal with a remotely possible edge case. We're talking about actually shortening the call to a more readable, and more correct usage.

I kind of agree with @Mark Baker that there is some performance consideration, but since id is a primary key, the MAX query will be very quick. Sure, the LAST_INSERT_ID() will be faster (since it's just reading from a session variable), but only by a trivial amount.

And you don't need a lot of users for this to occur. All you need is a lot of concurrent requests (not even that many). If the time between the start of the insert and the start of the select is 50 milliseconds (assuming a transaction safe DB engine), then you only need 20 requests per second to start hitting an issue with this consistently. The point is that the window for error is non-trivial. If you say 20 requests per second (which in reality is not a lot), and assuming that the average person visits one page per minute, you're only talking 1200 users. And that's for it to happen regularly. It could happen once with only 2 users.

And right from the MySQL documentation on the subject:

You can generate sequences without calling LAST_INSERT_ID(), but the utility of 
using the function this way is that the ID value is maintained in the server as 
the last automatically generated value. It is multi-user safe because multiple 
clients can issue the UPDATE statement and get their own sequence value with the
SELECT statement (or mysql_insert_id()), without affecting or being affected by 
other clients that generate their own sequence values.
ircmaxell