views:

172

answers:

2

I am currently working on a SSIS package that is extracting a table from one database to another. The tables in both of the databases use the same column as the primary key. My select statement to extract the data is a simple select statement. When I ran the package I was receiving a error that there where duplicate primary key values.

I reviewed my select statement and verified that my select statement was not returning duplicate rows. So to test this I removed the primary key from the table that I am inserting the data into and reran the SSIS package. After it ran I looked at the table to see what rows where being duplicated. What I found was that rows that where being edited while the extract was running where being duplicated, there was a record of it before the edit, and a record of it after the edit. I could easily tell this because the table has a last modified field that gets updated every time a record is updated.

I added a NOLOCK hint to my select statement, and it stopped returning duplicate rows.

So my question is why? I would have expected that a select statement with a NOLOCK table hint would have a higher chance of returning duplicate rows because it is not using locking, and that a select statement without the NOLOCK hint should use locking to make sure that it does not return duplicate rows.

Here is the select statement that I am using to select the data. I did verify that the joins are not causing it to duplicate rows:

SELECT pe.enc_id,
       pe.enc_nbr,
       pe.billable_ind,
       pe.clinical_ind AS clinical_ind,
       pe.budget_ind,
       pe.print_stmt_ind,
       pe.send_coll_letter_ind,
       pe.outsource_exempt_ind,
       cb.First_name + ' ' + cb.last_name AS CreatedBy,
       pe.create_timestamp AS create_timestamp,
       mb.first_name + ' ' + mb.last_name AS ModifiedBy,
       pe.modify_timestamp AS modify_timestamp
FROM   patient_encounter pe WITH(NOLOCK) 
       LEFT OUTER JOIN user_mstr cb WITH(NOLOCK) ON
           pe.created_by = cb.user_id
       LEFT OUTER JOIN user_mstr mb WITH(NOLOCK) ON
           pe.modified_by = mb.user_id
A: 

The WITH NOLOCK hint just tells the database server to ignore the locks, and just select the current values from the database - hence it simply selects all the current row value at the time it hits that row.

Note that you will NOT get the updates in the new table of the rows that were being updated.

Without seeing your SQL, I would guess that the way it was constructed, it grabbed the current row, waited for a lock to clear, then selected the new row as well.

Locking the entire table would prevent the changes/duplicates, but you risk locking everyone out of the table while you do your select.

EDIT: FYI ALTERNATIVES: use the READPAST—Rows locked by other processes are skipped and TABLOCK—Lock at the table level which of course will block other processes and might not be desired.

NOTE: UPDLOCK gets converted to XLOCK during the transaction write.

There are two categories for the hints: granularity and isolation-level. Granularity includes PAGLOCK, NOLOCK, ROWLOCK, and TABLOCK. Isolation-level hints include HOLDLOCK, NOLOCK, READCOMMITTED, REPEATABLEREAD, and SERIALIZABLE. A maximum of one from each group may be used.

EDIT2: Just for completeness: READCOMMITTED—Read only data from transactions that have been committed. This is the default behavior.

EDIT3: More info: The NOLOCK WILL read the rows, but you risk reading "dirty" data that will change or having data that will not exist if a ROLLBACK happens to a transaction which may impact accuracy of the selected set.

The other important information is to discover what kinds of locks the transactions are using so you can plan accordingly.

Mark Schultheiss
Thanks for the information. I added the select statement that I am using to the post.Why would it grab the row twice? After it read it the first time, why would a change to the record cause it to read it again?
Eric Maibach
Keep in mind that the "hints" CAN be overridden by the server, even if you do NOT use NOLOCK, it can use that internally, thus the duplicate rows.
Mark Schultheiss
Great point, thanks. Is there anyway to tell what is being used internally?
Eric Maibach
I think you are going to have to set up to find out what kinds of locks are in place during this event. It can get somewhat involved, there are some examples out there for that (books, google etc.). You can also view other code that is updating, and see what it is using, along with the service config/setup - questions then arise: is it using defaults, are the updates well scoped (small) during the lock etc. This often seen @ about 50-100 users standard setups(see above) that try to scale and is a bit of an art :). IF this is a "rolling" update/data move, you might consider the READPAST
Mark Schultheiss
+1  A: 

NOLOCK hint causes dirty read anomallies, and one such anomaly is a duplicate read. Such reads are frequent if an update changes the position of the row in the index scanned by the query:

  • say you have 2 rows in the table, with an ID key, rows with key values 1 and 2
  • one request (T1) runs UPDATE table SET key=3 WHERE key=1;
  • second request (T2) runs SELECT ... FROM table WITH(NOLOCK);
  • T1 locks the row with key value 1
  • T2 ignores the lock T1 has and reads the row with key value 1
  • T2 continue and reads row with key value 2
  • T1 update the row, and the row is moved in the index int he new position for key value 3
  • T2 continues to scan and reads the row with key value 3

So the SELECT has read a row twice, once while it had the key value 1 and once while it had a key value 3. This is just a trivial example of what can happen. In reality more complicated queries can run complex plans and use other indexes, all subject to such anomalies.

In short: NOLOCK hint is evil. If you want to avoid contention, use snapshot isolation.

Remus Rusanu
Thank you for the reply, and the great explanation of NOLOCK. What is confusing me is not that this is happening when I use NOLOCK, but that it is occuring when I do not use NOLOCK. The primary key column is not being changed, the record is having non id fields updated, and it is being read twice when I do not use NOLOCK.
Eric Maibach
Sorry for reading your question in diagonal and actually completely missing the point.
Remus Rusanu
Can you show an example of edit that occurs during the extract?
Remus Rusanu
I am working on duplicating the scenario in a test system. I want to run my extract against the test data, and while the extract is running update one of the records in the test database using the application. I want to run a trace on all of this so I can see how the record is being updated. It will probably take a little bit of time to setup, but hopefully I will have it done before the end of the day.
Eric Maibach