tags:

views:

284

answers:

5

Back story i have a table that stores cached times, and currently has about 1m rows. And when i update the table with new versions of the cached items i need to delete the old cached items ( About 3k items ). Its not critical that these items are deteted right away but i would prefere it, as when clients get out cached items i would like them to get the newest version.

But the deleting is still "to" slow taking several seconds making the end user wait, is there any way to make this go faster? Atm im doing a simple sql

DELETE FROM cache where cache_event_id = X

My question becomes: Can i make the query go faster ( I expect the cache table only to grow in size, so this problem will get worse )? Should i make the delete sql run its own thread, and live with the fact that users may old items for a little while?

Pr request the rest of the info for the table.

CREATE TABLE [dbo].[cache](
    [cache_id] [int] IDENTITY(1,1) NOT NULL,
    [cache_name] [nchar](128) NOT NULL,
    [cache_event_id] [int] NOT NULL,
    [cache_encounter_id] [int] NOT NULL,
    [cache_type_id] [tinyint] NOT NULL,
    [cache_creation_date] [datetime] NOT NULL,
    [cache_data] [varbinary](max) NOT NULL
) ON [PRIMARY]

All index are created by the sql server profiler, it seems like i need to manualy delete old index Index 1:

CREATE NONCLUSTERED INDEX [_dta_index_cache_6_366624349__K2_K3_K5_K4_7] ON [dbo].    [cache] 
(
    [cache_name] ASC,
    [cache_event_id] ASC,
    [cache_type_id] ASC,
    [cache_encounter_id] ASC
)
INCLUDE ( [cache_data]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

Index 2: // Might actualy not be in use

CREATE NONCLUSTERED INDEX [_dta_index_cache_6_366624349__K5_1_2_3_4_6_7] ON [dbo].[cache] 
(
    [cache_type_id] ASC
)
INCLUDE ( [cache_id],
[cache_name],
[cache_event_id],
[cache_encounter_id],
[cache_creation_date],
[cache_data]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

Index 3 ( I assume this one is usen deleting )

CREATE NONCLUSTERED INDEX [_dta_index_cache_6_366624349__K3] ON [dbo].[cache] 
(
    [cache_event_id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

Data is insert to the table via BulkCopy class

Data is fetched out ( This is the most critical part )

SqlCommand cmd = new SqlCommand("GetPageCache", connection);
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.AddWithValue("@event_id", EventID); // int 
cmd.Parameters.AddWithValue("@encounter_id", EncounterID); // int 
cmd.Parameters.AddWithValue("@type_id", (int)CacheType); //int 
cmd.Parameters.AddWithValue("@cachename", CacheName); // Required in some cases, but 90% this is just a fallback
+1  A: 

Having end-users wait on a delete call which performs cache clean-up seems unnecessary. This should certainly be a background job / thread.

Alternatively, you can use something like memcached which is designed to handle cache reads and expirations.

wsorenson
I already do mem caching, the sql cache is a fallback for when a item is not in memory. And required as page generation takes up to 1minute.
EKS
+1  A: 

Cleanup of this data should definitely be done asynchronously(via a scheduled sql job, a service, the job that populates the data etc). If you are worried about old items coming back in queries before you have a chance to delete them, you could implement some type of versioning scheme that would only return the newest items.

Jim Petkus
Version was a cool idea, every cache_event_id would have its own version. But a great idea
EKS
+2  A: 

If there are a lot of deletes, it may be writing a lot of stuff to the log file. And if there are any relationships involved it may be spending a long time determining if it is allowed to delete the records.

I had a similar issue, (but in my case I needed to make sure the old records were not visible) and ended up adding a bit field called hidden. So the 'delete' routine was actually just a update statement to set hidden to true and the lookups amended to ignore hidden records.

Then I could delete the hidden records in the background without affecting the users.

sgmoore
Thank you another great idea on how to solve it.
EKS
+3  A: 

The good news is this: if the DELETE statement will always delete about 3000 rows, the situation may not get any worse as the table grows larger.

The structure of your table may have a big effect on how long your DELETE operation takes and on how directly it affects users due to locks.

The index "helps" by making it easy to identify the row locator of the ~3000 doomed rows. However, these rows must then be located in the "whole" table (and in every index on the table) and then deleted. A likely reason for this to be slow is that these 3000 rows are spread across the table (and indexes) on separate data pages.

There's no one-size-fits-all answer for you, but you should take a good look at the organization and indexing of your table. There may be a way to change the organization and indexing in such a way that the doomed rows will be on fewer data pages and that the query plan for the DELETE won't perform 3000 separate lookups to reach them.

If you post the CREATE TABLE and CREATE INDEX statements for [cache], I might have specific suggestions instead of generalizations.

Additional remarks:

Here are some more thoughts.

Do you have a PRIMARY KEY constraint? If not, you have no clustered index, and this means your table is stored as a heap. That's not good, especially for a table that undergoes a lot of activity. Though I don't have all the details, I also agree with Dems below. It should help to have the primary key (which should be clustered) on (cache_event_id,cache_id).

Another bottleneck might be the cache data itself. You have INCLUDEd it in three indexes, so you are storing it in four places! I'm only guessing, but it seems very unlikely you have queries that return the cache_data column from many rows at once. As a result, you can get away with storing cache_data in the clustered index only (by default, a clustered index includes all columns). The database tuning advisor is good to give you ideas, but it's not always a good idea to do exactly what it says.

How big is the typical cache_data column? If it is almost always large (over 8K in size), it's going to cause a lot of activity with LOB overflow pages. I'm not an expert on workload tuning when there is a lot of LOB activity, but there are probably some good resources with advice. One thing to consider (not until you try index improvements and actually look at memory use, cache hits, etc.) is to consider changes that will allow more table rows to fit on a page:

  1. Reconsider whether you need the type nchar(128) for cache_name. (You might, but think about it. Is it always nearly 128 bytes of data? Is the use of Unicode necessary and worth the extra space? If not, maybe nvarchar(128) or varchar(128) is ok.)

  2. Consider whether it might be useful to set the 'large value types out of row' option to ON. The default is OFF, and this could cause you to have only one table row per data page on average, but no reduction in the need for LOB overflow pages. Look at the result of sp_spaceused or sys.dm_db_partition_stats to try to assess this. If you have only 1 or 2 rows per page, it might help to change the setting.

Steve Kass
Updated orignal post with all index and sql queries being used.
EKS
A: 

I think part of the issue is design, but assuming we ONLY want to speed up the delete and change nothing else?

The index with only "cache_event_id" is indeed used in the delete, but not the way you may excpect. Run the delete with an execution plan and you'll see after using the index it then also uses the primary key (assuming that's clustered). The Index is basically just acting as short cut to find the primary keys that need deleting. The primary key (or any clustered index) lets the RDBMS know physically where the records are, so they can be deleted.

Also, when a record is deleted all indexes need updating. Depending on how many indexes you have, and how they're set up, this can be labourios.

So my two recomendations are:
1. Ensure the Primary Key or a Clustered Index has "cache_event_id" as the first field
2. Rationalise the number of indexes, if possible, this may involve rewriting some queries

Dems