I am currently working on a SSIS package that is extracting a table from one database to another. The tables in both of the databases use the same column as the primary key. My select statement to extract the data is a simple select statement. When I ran the package I was receiving a error that there where duplicate primary key values.
I reviewed my select statement and verified that my select statement was not returning duplicate rows. So to test this I removed the primary key from the table that I am inserting the data into and reran the SSIS package. After it ran I looked at the table to see what rows where being duplicated. What I found was that rows that where being edited while the extract was running where being duplicated, there was a record of it before the edit, and a record of it after the edit. I could easily tell this because the table has a last modified field that gets updated every time a record is updated.
I added a NOLOCK hint to my select statement, and it stopped returning duplicate rows.
So my question is why? I would have expected that a select statement with a NOLOCK table hint would have a higher chance of returning duplicate rows because it is not using locking, and that a select statement without the NOLOCK hint should use locking to make sure that it does not return duplicate rows.
Here is the select statement that I am using to select the data. I did verify that the joins are not causing it to duplicate rows:
SELECT pe.enc_id,
pe.enc_nbr,
pe.billable_ind,
pe.clinical_ind AS clinical_ind,
pe.budget_ind,
pe.print_stmt_ind,
pe.send_coll_letter_ind,
pe.outsource_exempt_ind,
cb.First_name + ' ' + cb.last_name AS CreatedBy,
pe.create_timestamp AS create_timestamp,
mb.first_name + ' ' + mb.last_name AS ModifiedBy,
pe.modify_timestamp AS modify_timestamp
FROM patient_encounter pe WITH(NOLOCK)
LEFT OUTER JOIN user_mstr cb WITH(NOLOCK) ON
pe.created_by = cb.user_id
LEFT OUTER JOIN user_mstr mb WITH(NOLOCK) ON
pe.modified_by = mb.user_id