A common way to make checking for nonexistence of a row and inserting it atomic?

views:

105

answers:

+3 Q:

A common way to make checking for nonexistence of a row and inserting it atomic?

I have a web-application. The flow of processing a form in it goes like this:

Validate
List errors or Insert/update the data

In this particular scenario I'm developing a user registration process but I'm trying to find a common solution for all types of forms bases on checking availability of unique value in a database table.

In this user registration a user's login must be unique. In the validation phase the application checks for its availability in the database table and if it is available inserts a row. There are other fields that must be validated too like password and password confirmation. All validation occurs once in one HTTP request.

The problem is that I can't be sure that after the application checked for its availability it is not taken by another user in a parallel process before the process of the first user inserts it. I understand that there is a very very small chance that two users enter the same login in the same millisecond, but someday this might be the case with another form where several thousands users enter data to some form at the same time.

If validation is already passed a user should not see some error message saying that his login is already registered.

What I am trying to solve is to ensure that the unique value is available after checking for its availability and before inserting it in one HTTP request. It's OK that another user registered the same unique login while the first one was messing with his password and password confirmation not being the same.

This problem is solved easily with an existing row because I can SELECT it FOR UPDATE and it will be locked during a transaction. But I can't do the same with nonexistent row. That is the problem. How do I solve this?

Here are some solutions known to me. I am not sure which one of them is the best. More, I am not sure that the best way is known to me, so please share the ways known to you.

Table locking

I had already solved this problem in the past with table locking but I am not sure that it was the best way to do this. The process went like this:

Lock the table for write
Check for availability
Return error or Insert the row
Unlock the table

Some people say that locking the whole table is the worst solution of all. Maybe it is but it's the only way I could come up with by myself that worked.

A lock stays only during one HTTP request and of course not between several of them.

Insert and catch an error

This way was suggested to me by some other guys. They suggested to make that column a unique index column and separate validation and checking for the uniqueness in two phases. The process goes like this:

Validate the data
If validation went OK insert the row
If inserting the row failed show the error of nonavailability of the unique value

Of course I've made the column a unique index column. But that doesn't mean that I want to use the database's abilities to throw an error on validation; it should be done on the application level.

I don't like this way because I don't like the try-and-catch-an-exception way in this scenario because there is nothing exceptional in the process of checking of the availability of a value and inserting it. I believe it should be in the check-and-reserve-and-insert way. I believe that validating a user input should not be based on exceptions, because there is nothing exceptional in a user entering something wrong.

I might be wrong but this is my current point of view. If you think that I'm plainly wrong please tell me why.

As you say, "Insert and catch an error" is easy to code and doesn't lock out users, and if you skip step 1 and just attempt to insert or fail, then you save a round trip which might be important for a heavily loaded server.

Another option is to keep a list of temporarily reserved id's in memory, which you release if not permanently claimed after a while. As you have to be aware of other users, you'll need to use a thread-safe collection and probably key it on the user session.

Will 2010-01-27 06:13:47

Thanks for your answer.There is already a validation before insertion, which checks such things like password and password confirmation being the same. I can't put such checks to database level. If I'll add a second validation with inserting it will make the flow too complex because there will be two validation processes and the second one will need to be integrated with the first one to display all validation errors at once.The option with a list of reserved IDs in memory is interesting. If I wouldn't find anything easier to implement I might give a shot to this solution.

elnur 2010-01-27 06:24:56

Make a new table AvailableUsers whose columns are ID,ClaimedUserName,TimeStamp,SessionID or store in session in the form of object.

After validation, if the new user first post with the already claimed user name you can check from your session or database.

Adeel 2010-01-27 06:18:22

Thanks for your option, but it's just too complex for me.

elnur 2010-01-27 08:27:30

+1 A:

First thing's first: table locking is far from an ideal solution. If you are anticipating extending this to several thousand concurrent users, locking the whole table is a surefire way to bring your database to a grinding halt. You need to get as far away from table locking as you can in order to have a suitably scalable application.

Try/catch is the way I perform my unique-key bound inserts. In my opinion, it's the best method. The thing you have to realize is that any transactional database that utilizes row-level locking is vulnerable to deadlocks at any time. Even on normal, no-funny-stuff plain-vanilla queries. With that in mind, any application that utilizes a transactional database should technically have every executed write query inside a try/catch block!

Not many people develop like that of course, because in normal everyday use, this doesn't happen that often. But database "errors" aren't always errors in the true sense that you've done something wrong. They are a normal way of communicating the state of your data.

The bottom line is that the more locks you can avoid, the more scalable your application will be. Even if you could use SELECT...FOR UPDATE on a non-existent value, doing so could increase the number of deadlocks on that table significantly. Since this is easily avoided using a try/catch, I've always gone with the try/catch. As well, it's pretty easy to whip up a generic error handler wrapper for your database driver to pick out common errors such as a unique key or a deadlock, and handle them appropriately.

zombat 2010-01-27 06:19:23

Thanks for your definite answer. As you can see from my shortened registration code — http://pastebin.com/m7634278a — everything is already in the try/catch block. And as you can see from that code validation occurs **before** anything is tried to be inserted to the database. How would you integrate the catching of unique-key insert errors with the current flow of my webapp? Should I rewrite the flow completely to accommodate the method suggested by you? By the way, thanks for the link to that page in MySQL docs — it was very helpful to me.

elnur 2010-01-27 06:51:54

Your code looks good. I'm not familiar with Doctrine, but I'm guessing there's a method in there somewhere to fetch the last database error and error code. If the `$user->save()` doesn't throw an exception, then you could throw one manually by utilizing such methods. Since you just want to redisplay the form on *any* problem, I think you're fine. You could try to detect the unique username problem to display a custom error message if you *wanted*, by identifiying the error code (e.g., MySQL's are here, it's 1062: http://dev.mysql.com/doc/refman/5.1/en/error-messages-server.html)

zombat 2010-01-27 07:07:42

Aha! Indeed, Doctrine has portable error codes, `-5` being the duplicate error, and it will throw an exception in `$user->save()` with this error code. So, as far as I understand, I just need to catch this exception and based on that error code just redisplay the form with revalidation so it looks like there was only single validation and that validation failed, right? Is this what you suggest?

elnur 2010-01-27 07:23:06

Yep, that's exactly what I'd suggest. If the exception has a code of -5, you could make a custom error message for redisplaying the form ("Username already taken" or something), otherwise you're good to go.

zombat 2010-01-27 07:29:02

Thank you very much! I just can't believe it would be **so** simple and easy to implement! The only thing I had to do was to add another `catch` block. Here is what my code looks like now: http://pastebin.com/f7a8f48ff.

elnur 2010-01-27 08:16:45

ansaurus

tags:

views:

answers:

A common way to make checking for nonexistence of a row and inserting it atomic?

related questions