views:

117

answers:

5

I have a very large (millions of rows) SQL table which represents name-value pairs (one columns for a name of a property, the other for it's value). On my ASP.NET web application I have to populate a control with the distinct values available in the name column. This set of values is usually not bigger than 100. Most likely around 20. Running the query

SELECT DISTINCT name FROM nameValueTable

can take a significant time on this large table (even with the proper indexing etc.). I especially don't want to pay this penalty every time I load this web control.

So caching this set of names should be the right answer. My question is, how to promptly update the set when there is a new name in the table. I looked into SQL 2005 Query Notification feature. But the table gets updated frequently, very seldom with an actual new distinct name field. The notifications will flow in all the time, and the web server will probably waste more time than it saved by setting this.

I would like to find a way to balance the time used to query the data, with the delay until the name set is updated.

Any ides on how to efficiently manage this cache?

+1  A: 

Add a unique increasing sequence MySeq to your table. You may want to try and cluster on MySeq instead of your current primary key so that the DB can build a small set then sort it.

SELECT DISTINCT name FROM nameValueTable Where MySeq >= ?; Set ? to the last time your cache has seen an update.

You will always have a lag between your cache and the DB so, if this is a problem you need to rethink the flow of the application. You could try making all requests flow through your cache/application if you manage the data:

requests --> cache --> db

Justin
@KM I like your solution is better. The question is how many distinct values are there in the table?
Justin
@Justin, OP says "This set of values is usually not bigger than 100. Most likely around 20."
KM
+2  A: 

A little normalization might help. Break out the property names into a new table, and FK back to the original table, using a int ID. you can display the new table to get the complete list, which will be really fast.

KM
A: 

I don't know the specifics of .NET, but I would pass all the update requests through the cache. Are all the update requests done by your ASP.NET web application? Then you could make a Proxy object for your database and have all the requests directed to it. Taking into consideration that your database only has key-value pairs, it is easy to use a Map as a cache in the Proxy.

Specifically, in pseudocode, all the requests would be as following:

// the client invokes cache.get(key)
if(cacheMap.has(key))  { 
    return cacheMap.get(key);
} else { 
    cacheMap.put(key, dababase.retrieve(key));
}

// the client invokes cache.put(key, value)
cacheMap.put(key, value);
if(writeThrough) {
    database.put(key, value);
}

Also, in the background you could have an Evictor thread which ensures that the cache does not grow to big in size. In your scenario, where you have a set of values frequently accessed, I would set an eviction strategy based on Time To Idle - if an item is idle for more than a set amount of time, it is evicted. This ensures that frequently accessed values remain in the cache. Also, if your cache is not write through, you need to have the evictor write to the database on eviction.

Hope it helps :)

-- Flaviu Cipcigan

Flaviu Cipcigan
+2  A: 

Figuring out your pattern of usage will help you come up with the right balance.
How often are new values added? are new values added always unique? is the table mostly updates? do deletes occur?

One approach may be to have a SQL Server insert trigger that will check the table cache to see if its key is there & if it's not add itself

Nick Kavadias
A: 

If you're not allowed to change the actual structure of this huge table (for example, due to huge numbers of reports relying on it), you could create a holding table of these 20 values and query against that. Then, on the huge table, have a trigger that fires on an INSERT or UPDATE, checks to see if the new NAME value is in the holding table, and if not, adds it.

CodeByMoonlight