I know that snapshot isolation would fix this problem, but I'm wondering if NOLOCK is safe in this specific case so that I can avoid the overhead.
I have a table that looks something like this:
drop table Data
create table Data
(
Id BIGINT NOT NULL,
Date BIGINT NOT NULL,
Value BIGINT,
constraint Cx primary key (Date, Id)
)
create nonclustered index Ix on Data (Id, Date)
There are no updates to the table, ever. Deletes can occur but they should never contend with the SELECT because they affect the other, older end of the table. Inserts are regular and page splits to the (Id, Date) index are extremely common.
I have a deadlock situation between a standard INSERT and a SELECT that looks like this:
select top 1 Date, Value from Data where Id = @p0 order by Date desc
because the INSERT acquires a lock on Cx (Date, Id; Value) and then Ix (Id, Date), but the SELECT acquires a lock on Ix (Id, Date) and then Cx (Date, Id; Value). This is because the SELECT first seeks on Ix and then joins to a seek on Cx.
Swapping the clustered and non-clustered index would break this cycle, but it is not an acceptable solution because it would introduce cycles with other (more complex) SELECTs.
If I add NOLOCK to the SELECT, can it go wrong in this case? Can it return:
- More than one row, even though I asked for TOP 1?
- No rows, even though one exists and has been committed?
- Worst of all, a row that doesn't satisfy the WHERE clause?
I've done a lot of reading about this online, but the only reproductions of over- or under-count anomalies I've seen (one, two) involve a scan. This involves only seeks. Jeff Atwood has a post about using NOLOCK that generated a good discussion. I was particularly interested in a comment by Rick Townsend:
Secondly, if you read dirty data, the risk you run is of reading the entirely wrong row. For example, if your select reads an index to find your row, then the update changes the location of the rows (e.g.: due to a page split or an update to the clustered index), when your select goes to read the actual data row, it's either no longer there, or a different row altogether!
Is this possible with inserts only, and no updates? If so, then I guess even my seeks on an insert-only table could be dangerous.
Update:
I'm trying to figure out how snapshot isolation works. It seems to be row-based, where transactions read the table (with no shared lock!), find the row they are interested in, and then see if they need to get an old version of the row from the version store in tempdb.
But in my case, no row will have more than one version, so the version store seems rather pointless. And if the row was found with no shared lock, how is it different to just using NOLOCK?