views:

125

answers:

4

I'm writing a daemon to monitor creation of new objects, which adds rows to a database table when it detects new things. We'll call the objects widgets. The code flow is approximately this:

1: every so often:
2:   find newest N widgets (from external source)
3:   foreach widget
4:     if( widget not yet in database )
5:       add rows for widget

The last two lines present a race condition, since if two instances of this daemon are running at the same time, they may both create a row for widget X if the timing lines up.

The most obvious solution would be to use a unique constraint on the widget identifier column, but that isn't possible due to the database layout (it's actually allowed to have more than one row for a widget, but the daemon shouldn't ever do this automatically).

My next thought would be to use a transaction, since this is what they're intended for. In the ADO.NET world, I believe I would want an isolation level of Serializable, but I'm not positive. Can someone point me in the right direction?

Update: I did some experimentation, and the Serialized transaction doesn't appear to resolve the issue, or at least not very well. The interesting case is described below, and assumes that only one table is involved. Note that I'm not positive about the lock details, but I think I have it right:

Thread A: Executes line 4, acquiring a read lock on the table
Thread B: Executes line 4, acquiring a read lock on the table
Thread A: Tries to execute line 5, which requires upgrading to a write lock
   (this requires waiting until Thread B unlocks the table)
Thread B: Tries to execute line 5, again requiring a lock upgrade
   (this requires waiting until Thread A unlocks)

This leaves us in a classic deadlock condition. Other code paths are possible, but if threads A and B don't interleave, there's no synchronization issue anyway. The end result is that a SqlException is thrown on one of the threads, after SQL detects the deadlock and terminates one of the statements. I can catch this exception and detect the particular error code, but that doesn't feel very clean.

Another route I may take is to create a second table that tracks widgets seen by the daemon, where I can use a unique constraint. This still requires catching and detecting certain error codes (in this case, integrity constraint violations), so I'm still interested in a better solution, if somebody can think of one.

A: 

How do you check 'if (widget not yet in database)'. If this is written in sql in form of 'Select', you can use 'Select for update' to allow only one daemon instance to be able to do that at a time. By using transaction for last two line, and use 'select for update' to lock, you will avoid race.

I am sure there is an equivalent in ado.net.

Journeyman Programmer
Select for update does not work here. Because you cannot lock a row that isn't there. If he does not find the row, he needs to make sure that nobody is inserting at the same time he is doing this.
Stefan Steinegger
+2  A: 

Generally, you should always use transactions if you have more then one process or thread using the database at the same time.

Isolation level "serializable" should actually work. It does not allow that data read from one transaction is changed by another. But it locks a lot and should not generally be used, because it slows down the application and there is a higher risk of dead locks.

Alternatives:

  • you only can lock the whole table to make sure that nobody writes into it while you are checking if something is (not) there. The problem is, only one can write to the table at the same time, which means that this slows downs everything a lot. (You could search for the data, if it is not there, lock the table and search again before inserting. This is a common pattern.)
  • Honestly, you should think about the fact that two transactions try to insert the same row at the same time. This is probably where you should start solving it. Only one daemon should be responsible for the same data.
    • make every daemon handling its own data
    • make every daemon calling a service instead of the database. There you can put things together before inserting it.

By the way, you need a unique identifier for the data to clearly identify it anyway. How can you search for it in the database if you don't have a unique identifier?

Stefan Steinegger
Thanks for the ideas. Unfortunately, I think this still isn't going to solve the problem (I'll explain in the original question).
Charlie
In an consistency point of view, it works :-) but of course it is not good if it results in a dead locks. I'm afraid if I can't help much, but I add some alternatives to my answer.
Stefan Steinegger
Thanks again for giving so much thought to this. I had also considered the service idea, and I think that would be the ideal solution in my particular case (since I can then use more "standard" locking primitives like mutexes or monitors). I agree that having two daemons as I described is not a great idea.
Charlie
@Charlie: The service would give you much more control. But locking in the application is not always a better approach. It is more error prone. The DB always cares about concurrency, wile applications locks need to be implemented by you and it tends to get forgotten for some cases.
Stefan Steinegger
True enough. However, I'm pretty comfortable with threading issues and writing services, so I think it could be done without too much pain.
Charlie
A: 

If you've got SQL Server 2008 (or another DBMS that supports it), consider using a MERGE statement. That can be written to do an insert only if the row does not exist.

Jonathan Leffler
A: 

(it's actually allowed to have more than one row for a widget, but the daemon shouldn't ever do this automatically)

Then it's not the database that's the problem. it's the fact that you've got multiple daemons that don't know the've detected the same object. Seems to me that's where you need to focus your attention. From the point of view of the database, they're both submitting perfectly legitimate rows.

What happens if you change:

1: every so often:
2:   find newest N widgets (from external source)
3:   foreach widget
4:     if( widget not yet in database )
5:       add rows for widget

to

1: every so often:
2:   while there are unhandled widgets
2:       find first unhandled widget
4:       if( widget not yet in database )
5:           add one row for widget
le dorfier
Right, I agree with you. I didn't intend to say that the database was to blame in any sense, just was looking for a way to meet the stated goals. I do think I'll be able to use the database to help, by creating a table specifically for use by the daemons, where I *can* enforce uniqueness.
Charlie
I'm still not sure I see how you can detect a legitimate duplicate from an illegitimate one. But see edit to my answer.
le dorfier