views:

536

answers:

6

I have a Kimball-style DW (facts and dimensions in star models - no late-arriving facts rows or columns, no columns changing in dimensions except expiry as part of Type 2 slowly changing dimensions) with heavy daily processing to insert and update rows (on new dates) and monthly and daily reporting processes. The fact tables are partitioned by the dates for easy rolloff of old data.

I understand the WITH(NOLOCK) can cause uncommitted data to be read, however, I also do not wish to create any locks which would cause the ETL processes to fail or block.

In all cases, when we are reading from the DW, we are reading from fact tables for a date which will not change (the fact tables are partitioned by date) and dimension tables which will not have attributes changing for the facts they are linked to.

So - are there any disadvantages? - perhaps in the execution plans or in the operation of such SELECT-only queries running in parallel off the same tables.

+1  A: 

As long as it's all no-update data there's no harm, but I'd be surprised if there's much benefit either. I'd say it's worth a try. The worst that will happen is that you'll get incomplete and/or inconsistent data if you are in the middle of a batch insert, but you can decide if that invalidates anything useful.

le dorfier
The fact rows we are reading will not change and the dimension rows will always be valid, but they may be expired and a new dimension creted for new facts.
Cade Roux
Seems entirely straightfoward to me. I have only two questions. 1. Is there a problem with the way its running without making this change (i.e. is this possibly a premature optimization). 2. These are all read-only queries and you're relaxing their isolation levels. What bad thing (other than the obvious crappiness of the resuits, which you are evidently mitigating by your emphasis on appends and fact versioning) are you imagining?
le dorfier
I'm not in control of the ETL, but I am responsible for all the reporting. I am not given access to sp_who, so I need to proactively make sure all my (significant) processing does not interfere with daily and monthly loads before the DBAs complain I'm blocking them.
Cade Roux
Then I think the simple answer to your specific question is that using WITH(NOLOCK) will decrease and likely eliminate the likelihood that you will create any locks which would cause the ETL processes to fail or block; and there are no disadvantages created by this strategy. You have a bunch of suggestions for more exotic ways to accomplish approximately the same thing, but they generally involve global server setting alterations that you probably don't have access to and you dbas will be suspicious about.
le dorfier
+1  A: 

Yes. Your SQL will be far less readable. You will inevitably miss some NOLOCK hints because SQL SELECT commands using the NOLOCK strategy have to put it all over the place.

You can get the same thing by setting the isolation level

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

In the end you get a 10% performance boost (sorry I'm too lazy too look up the article for it, but it's out there)

I'd say a 10% gain isn't worth reducing readability.

MatthewMartin
+1 for setting isolation level but "with nolock" is better than nothing. It's not a 10% thing as much as will it run or not thing.
jms
I will consider it. There are other operations on the connection (i.e. my process configuration and status data, not the fact/dim star-modelled data) which I do not necessarily want to have behave that way. At this point I have the WITH(NOLOCK) centralized into the views I use which wrap and flatten the star models in the DW database.
Cade Roux
A: 

If making the whole database read-only is possbile, Then this is a better option. You'll get read-uncommitted performance without having to modify all your code.

ALTER DATABASE adventureworks SET read_only
Nick Kavadias
The DW database is already read-only to my user, but has to be writable to the ETL processes loading new data into the facts (and dimensions as necessary). My database holds my process procedures and configuration.
Cade Roux
Nick Kavadias
Maybe a read-only file group?
Nick Kavadias
+2  A: 

This is what you probably need:

`ALTER DATABASE AdventureWorks SET READ_COMMITTED_SNAPSHOT ON;

ALTER DATABASE AdventureWorks SET ALLOW_SNAPSHOT_ISOLATION ON; `

Then go ahead and use

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

in your queries. According to BOL:

The behavior of READ COMMITTED depends on the setting of the READ_COMMITTED_SNAPSHOT database option:

If READ_COMMITTED_SNAPSHOT is set to OFF (the default), the Database Engine uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. The shared lock type determines when it will be released. Row locks are released before the next row is processed. Page locks are released when the next page is read, and table locks are released when the statement finishes.

If READ_COMMITTED_SNAPSHOT is set to ON, the Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions.

Hope this help. Raj

Raj
I will consider it. There are other operations on the connection (i.e. my process configuration and status data, not the fact/dim star-modelled data) which I do not necessarily want to have behave that way.
Cade Roux
A: 

Have you considered creating a DATABASE SNAPSHOT of your DW and run your reports off it?

Remus Rusanu
No, that's not really possible, since we're talking about several TBs of data. The DW is designed for this purpose, which is why the fact tables are partitioned by date.
Cade Roux
But a database snapshot is a sparse file with copy-on-write semantics. All you need is the space on disk to be reserved, actual I/O operations will occur only when a write occurs on the original database.
Remus Rusanu
A: 

NOLOCK performs a ‘dirty read’ (indecently READ UNCOMMITTED does the same thing as NOLOCK). If the database is being updated as you read there is a danger that you will get inconsistent data back. The only option is to either accept locking and hence blocking, or to pick one of the two new isolation levels offered in SQL 2005 onwards discussed here.

Ben Breen
There are no inserts or updates to the data possible. The only changes will be to future dates which we don't read until the processing is complete.
Cade Roux
Then NOLOCK is the correct solution.
Ben Breen