ansaurus

Question

Two threads adding new rows at the same time - how to prevent?

Answer 1

A:

You need to wrap the calls to check and write the row in a critical section or mutex.

With a critical section, interrupts and thread-switching are disabled while you perform the check and write, so both threads can't write at once.

With a mutex, the first thread would lock the mutex, perform its operations, then unlock the mutex. The second thread would attempt to do the same but the mutex lock would block until the first thread released the mutex.

Specific implementations of critical section or mutex functionality would depend on your platform.

Vicky 2009-07-21 09:55:22

Answer 2

+3 A:

You should be using a queue, possibly blocking queue. Threads A and B (producers) would add objects to the queue and another thread C (consumer) would poll the queue and remove the oldest object from the queue persisting it to the DB. This will prevent the problem when both A and B in the same time want to persist equal objects

Boris Pavlović 2009-07-21 09:55:39

Answer 3

+1 A:

I think this is a job for SQL constraints, namely "UNIQUE" on the set of columns that have the data + the appropriate error handling.

Andrew Y 2009-07-21 09:56:56

Constraints will stop duplicate entries going in, but you can ensure that you don't get the behaviour above by using transactions, can't you ?

Brian Agnew 2009-07-21 10:04:29

Actually, yes. Upvoted finnw's answer.

Andrew Y 2009-07-21 10:38:00

Answer 4

+5 A:

You speak of "rows" so presumably this is a SQL database?

If so, why not just use transactions?

(Unless the threads are sharing a database connection, in which case a mutex might help, but I would prefer to give each thread a separate connection.)

finnw 2009-07-21 09:57:53

Answer 5

A:

Multithreading is always brain-fucking ^^.

Main thing to do is to delimitate the critical ressources and critical operations.

Critical ressource : your table.
Critical operation : adding yes, but the whole procedure

You need to lock access to your table from the beginning of the check, until the end of the add. If a thread attempt to do the same, while another is adding/checking, then he waits until the thread finish its operation. As simple as that.

Clement Herreman 2009-07-21 09:58:05

-1 ? Can i know why, so I can correct myself =)

Clement Herreman 2009-07-21 10:46:52

Can I guess it's the use of language?

Roee Adler 2009-07-21 11:20:23

I hope it's not, that would be one of the silliest kind of -1

Clement Herreman 2009-07-21 12:41:26

I didn't downvote you, but I find that kind of language offensive as do many people. It is inappropriate in a professional forum.

HLGEM 2009-07-21 20:51:12

Answer 6

+3 A:

I would recommend avoid locking in the client layer. Synchronized only works within one process, later you may scale so that your threads are across serveral JVMs or indeed machines.

I would enforce uniquiness in the DB, as you suggest this will then cause an exception for the second inserter. Catch that exception and do an update if that's the business logic you need.

But consider this argument:

Sometimes either of the following sequences may occur:

A insert Values V_A, B updates to values V_B.

B insert V_B, A updates to V_A.

If the two threads are racing either of these two outcomes V_A or V_B is equally valid. So you can't distinguish the second case from A inserts V_A and B just fails!

So in fact there may be no need for the "fail and then update" case.

djna 2009-07-21 10:01:32

Answer 7

+1 A:

Most database frameworks (Hibernate in Java, ActiveRecord etc in Ruby) have a form of optimistic locking. What this means is that you execute each operation on the assumption that it will work without conflict. In the special case where there is a conflict, you check this atomically at the point where you do the database operation, throw an exception, or error return code, and retry the operation in your client code after requerying etc.

This is usually implemented using a version number on each record. When a database operation is done, the row is read (including the version number), the client code updates the data, then saves it back to the database with a where clause specifying the primary key ID AND the version number being the same as it was when it was read. If it is different - this means another process has updated the row, and the operation should be retried. Usually this means re-reading the record, and doing that operation again on it with the new data from the other process.

In the case of adding, you would also want a unique index on the table, so the database refuses the operation, and you can handle that in the same code.

Pseudo code would look something like

do {
  read row from database
  if no row {
     result_code = insert new row with data
  } else {
     result_code = update row with data
  }
} while result_code != conflict_code

The benefit of this is that you don't need complicated synchronization/locking in your client code - each thread just executes in isolation, and uses the database as the consistency check (which it is very quick, and good at). Because you're not locking on some shared resource for every operation, the code can run much faster.

It also means that you can run multiple separate operating system processes to split the load and/or scale the operation over multiple servers as well without any code changes to handle conflicts.

madlep 2009-07-21 10:03:15

Answer 8

A:

You need to perform the act of checking for existing rows and then updating / adding rows inside a single transaction.

When you perform your check you should also acquire an update lock on those records, to indicate that you are going to write to the database based on the information that you have just read, and that no-one else should be allowed to change it.

In pseudo T-SQL (for Microsoft SQL Server):

BEGIN TRANSACTION
SELECT id FROM MyTable WHERE SomeColumn = @SomeValue WITH UPDLOCK
-- Perform your update here
END TRANSACTION

The update lock wont prevent people reading from those records, but it will prevent people from writing anything which might change the output of your SELECT

Kragen 2009-07-21 10:45:07

Answer 9

A:

Thank you all for the answers!

Roy.

2009-07-22 14:35:32

ansaurus

tags:

views:

answers:

Two threads adding new rows at the same time - how to prevent?

related questions