views:

149

answers:

2

When I am testing a query I will typically drop the following lines in front of whatever i am testing to make sure I'm starting from the same baseline every time i run a query.

CHECKPOINT
GO
DBCC FREEPROCCACHE
GO
DBCC DROPCLEANBUFFERS
GO
EXEC sp_MyProc 12345

On the Stored proc i was running today i noticed that when I ran it with these lines it took aproximately 18 minutes everytime. When I left these lines off it only took 3. Seeing the drastic difference caused by having a cleared cache vs a primed cache i decided to add the following to see if I could manually prime the cache before running my proc and see what that did to performance.

CHECKPOINT
GO
DBCC FREEPROCCACHE
GO
DBCC DROPCLEANBUFFERS
GO
SELECT top 1 '1' from Table1
EXEC sp_MyProc 12345

As you may have guessed sp_MyProc uses Table1 quite a bit. I was surprised to find that doing this took my run time down to about 6 minutes consistently. Although it does improve the performance it looks a little hackish, and I'm curious if there is something built into SQL Server that will accomplish this.

  • Is doing this to improve the performance of your queries unheard of?
  • Am i even right to assume that the improvement in time that I saw was a result of a "Primed" cache?

If my understanding of Caching is a bit off please feel free to share any links or info you think might be helpful.

UPDATE: Well I'm embarrassed to say that I tried to reproduce this behavior today, but was unable to. I spoke with some people at my work and it looks like some of the stuff they were doing on the DB yesterday may have made it appear as if my select before the proc was improving performance when in fact it wasn't. I'd still be interested to hear if anyone knows if "priming" the cache is possible through.

+1  A: 

Providing an "answer" in order to try to work this through as this is something I'm particularly interested in.

I came across this MSDN article on how to see what is in the SQL Server cache. There is a query there that will show you how many data pages are cached by object - I've tweaked it just to include the index name as below:

SELECT count(*) AS cached_pages_count, obj.name, index_id, i.name AS IndexName
FROM sys.dm_os_buffer_descriptors AS bd 
    INNER JOIN 
    (
        SELECT object_id, object_name(object_id) AS name 
            ,index_id ,allocation_unit_id
        FROM sys.allocation_units AS au
            INNER JOIN sys.partitions AS p 
                ON au.container_id = p.hobt_id 
                    AND (au.type = 1 OR au.type = 3)
        UNION ALL
        SELECT object_id, object_name(object_id) AS name   
            ,index_id, allocation_unit_id
        FROM sys.allocation_units AS au
            INNER JOIN sys.partitions AS p 
                ON au.container_id = p.partition_id 
                    AND au.type = 2
    ) AS obj 
        ON bd.allocation_unit_id = obj.allocation_unit_id
    LEFT JOIN sysindexes i ON obj.object_id = i.id AND obj.index_id = i.indid
WHERE database_id = db_id()
GROUP BY obj.name, index_id, i.name
ORDER BY cached_pages_count DESC;

If you try the following steps, you should be able to see what is going on with regard to caching. Do these within your database (as opposed to e.g. master):

1) checkpoint + clear the cache down
2) run the above query and you should get probably get 1 record returned (for sysobjvalues), but nothing for Table1
3) now run the SELECT TOP 1 '1' FROM MyTable statement
4) rerun the above query and see what appears in the results now - you'll probably see record(s) for MyTable showing cached pages - make a note of that number

This should give you an indication as to the level of data caching that is happening for that initial SELECT. If you repeat the process through again but instead of the SELECT TOP statement, execute your sproc, and then see how much ends up in the cache when that is run - maybe comparing these results will indicate the relative amount of caching that's being done by the SELECT TOP 1 in comparison to the sproc call - and that relative amount could indicate the performance improvement.

This is very much "thinking out loud" stuff. I wouldn't have thought the TOP 1 would have really primed the cache significantly for the sproc call, but that's why I'm interested in this question!

I would have initially thought it was more to do with other factors (e.g. server/disk load). You could alternate between to the 2 scenarios for 3 or 4 iterations, one after the other, to double check whether the SELECT TOP approach is in fact consistently better (help minimise the risk of it being a one-off blip)

Hope this helps/gets the ball rolling.

Update:
Now you know it's not the SELECT TOP that's priming the cache, a good way to prime the cache is as AdrianBanks said. At least now you can explain what was unexpected/confusing the performance difference! Keep the above script in your library, it is useful for checking the state of the cache.

AdaTheDev
Looks like I may have been wrong in assuming that my performance gains were do to the select. Take a look at the update to my post. Great post though. Very informative!
Abe Miessler
+1  A: 

Your update to your question tallies with what I would expect to happen. I can't see how running the SELECT 1... query could have any real performance benefit on the subsequent query.

As I understand it, SQL Server loads data pages (containing either table data or index data) into memory as it needs them when running queries. These are kept in memory unless they are explicitly cleared (using DBCC DROPCLEANBUFFERS - ie. remove any buffers (cached pages) in memory that have not been altered since loaded), or there is memory pressure (either low free memory on the machine or a maximum memory set on SQL Server). Because of this behaviour, it can be beneficial to warm-up a SQL Server database for use. When you subsequently run a query, the data needed to collect the query results may already be in memory. If it is, the query will execute faster as it will incur less IO.

The problem comes, however, in knowing what to pre-cache and therefore what queries to run. You could run a SQL trace on typical activity and then replay it to pre-cache data that gets used frequently. Without letting SQL Server hold a massive amount of allocated memory though, you are always going to have to read some things from disk (unless you have a small database). As you will never know what is cached and what isn't, relying on this behaviour for performance feels wrong.

I would concentrate my efforts into making the queries more efficient, by reading less data or using indices where possible. That will also give you general benefits as well as better performance from cold starts.

adrianbanks