ansaurus

Question

Efficeintly maintaining a cache of distinct items in a huge DB table

Answer 1

+1 A:

Add a unique increasing sequence MySeq to your table. You may want to try and cluster on MySeq instead of your current primary key so that the DB can build a small set then sort it.

SELECT DISTINCT name FROM nameValueTable Where MySeq >= ?; Set ? to the last time your cache has seen an update.

You will always have a lag between your cache and the DB so, if this is a problem you need to rethink the flow of the application. You could try making all requests flow through your cache/application if you manage the data:

requests --> cache --> db

Justin 2009-08-05 15:37:48

@KM I like your solution is better. The question is how many distinct values are there in the table?

Justin 2009-08-05 17:18:10

@Justin, OP says "This set of values is usually not bigger than 100. Most likely around 20."

KM 2009-08-05 17:34:41

Answer 2

+2 A:

A little normalization might help. Break out the property names into a new table, and FK back to the original table, using a int ID. you can display the new table to get the complete list, which will be really fast.

KM 2009-08-05 15:52:16

Answer 3

A:

I don't know the specifics of .NET, but I would pass all the update requests through the cache. Are all the update requests done by your ASP.NET web application? Then you could make a Proxy object for your database and have all the requests directed to it. Taking into consideration that your database only has key-value pairs, it is easy to use a Map as a cache in the Proxy.

Specifically, in pseudocode, all the requests would be as following:

// the client invokes cache.get(key)
if(cacheMap.has(key))  { 
    return cacheMap.get(key);
} else { 
    cacheMap.put(key, dababase.retrieve(key));
}

// the client invokes cache.put(key, value)
cacheMap.put(key, value);
if(writeThrough) {
    database.put(key, value);
}

Also, in the background you could have an Evictor thread which ensures that the cache does not grow to big in size. In your scenario, where you have a set of values frequently accessed, I would set an eviction strategy based on Time To Idle - if an item is idle for more than a set amount of time, it is evicted. This ensures that frequently accessed values remain in the cache. Also, if your cache is not write through, you need to have the evictor write to the database on eviction.

Hope it helps :)

-- Flaviu Cipcigan

Flaviu Cipcigan 2009-08-05 15:58:10

Answer 4

+2 A:

Figuring out your pattern of usage will help you come up with the right balance.
How often are new values added? are new values added always unique? is the table mostly updates? do deletes occur?

One approach may be to have a SQL Server insert trigger that will check the table cache to see if its key is there & if it's not add itself

Nick Kavadias 2009-08-05 16:07:58

Answer 5

A:

If you're not allowed to change the actual structure of this huge table (for example, due to huge numbers of reports relying on it), you could create a holding table of these 20 values and query against that. Then, on the huge table, have a trigger that fires on an INSERT or UPDATE, checks to see if the new NAME value is in the holding table, and if not, adds it.

CodeByMoonlight 2009-08-05 16:16:40

ansaurus

tags:

views:

answers:

Efficeintly maintaining a cache of distinct items in a huge DB table

related questions