ansaurus

Question

How should I store Dynamically Changing Data into Server Cache?

Answer 1

+2 A:

You are storing the entire database in memory, as a list, and re-query it for every request as a list traversal. Frankly I doubt this 'cache' is in fact faster than just running a SQL query. Traversing list is never going to beat a database...

What you should do instead is cache specific query results. Like the result set for a rowID and groupID, cached keyed by the two arguments. For refresh, rely on the built-in cache invalidation infrastructure around Query Notifications, see this article The Mysterious Notification to understand how that works. With an ASP.Net project, all you have to do is leverage the SqlCacheDependency.

Remus Rusanu 2010-04-07 18:07:10

Your saying that the SQLCacheDependency item will clear up the errors. It looks like it sends a notification of some sort. Can this keep up with over 1000 rows changes a minute?

Scott 2010-04-07 18:19:57

No, 100 row changes a minute is not something you cache. 100 changes a minute you query the database each time, since it likely changed.

Remus Rusanu 2010-04-07 18:22:27

Then I can't use SQLCacheDependency... If I removed the cached lists I have already implemented, then the execution and results slows down considerably. I need to use something else. Also, it seems that Linq is not able to handle the SQLCacheDependency, so there is another problem.

Scott 2010-04-07 18:27:33

Execution and result slow down means you have a bad schema design in the database. LinqToSql with SqlDependency work just fine, see http://code.msdn.microsoft.com/linqtosqlcache or http://dunnry.com/blog/UsingSQLDependencyObjectsWithLINQ.aspx. Right now you are down the path of implementing a database in memory using lists and you are just discovering that is not really that easy. Will get down right impossible when you'll scale to two WWW servers. You can fix your DB design now, or you'll have to do it later.

Remus Rusanu 2010-04-07 18:34:16

I honestly don't think I have a bad DB design. I have designed it really well. Ill look more into this and get back to you on it. Let me see what I can find out.

Scott 2010-04-07 18:38:54

The 'game' is 100 columns X 5000 rows accessed by 12k users? Ie. each HTTP hit traverses the in-memory 500k list end-to-end at least once?

Remus Rusanu 2010-04-07 18:50:18

It traverses, yes... I have a LINQ query that pulls the correct data to be updated/used in the list.

Scott 2010-04-07 18:56:45

It doesn't work. I made an Edit in the above stating that the two links you posted wont work for anonymous types. Im still stuck as to what to do. Think im gonna start a bounty on this item.

Scott 2010-04-09 17:12:32

SqlCacheDependency is for Tables. SqlDependency is for arbitrary queries.

Remus Rusanu 2010-04-09 17:37:06

So what your saying is that if I use SQLDependency, it should work? Does it still cache the results and make the query faster?

Scott 2010-04-09 17:45:21

SQLDependency still relies on the table and can't have anonymous types... Just checked..

Scott 2010-04-09 18:07:30

@Scott - I think what Remus Rusanu is getting at is that your approach to caching rapidly changing data is flawed. Even if you come up with the perfect means to expire a Cache element, if the data is changing as rapidly as you suggest, then you will get almost no Cache hits because the Cache will be expiring almost as fast as it is put into Cache.

Thomas 2010-04-11 21:27:33

Answer 2

+3 A:

I really doubt your caching solution is actually of any use. List<T> cannot be indexed, a lookup in your list is therefore always a O(n) operation.

Assuming you have profiled your application and know the database is your bottleneck, this is what you can do:

In a database you can create indexes on your data, a lookup on them will exhibit O(log(n)) typically. You should create coverage indexes for queries that include your static data. Leave the frequently changing data non-indexed, because this would slow down inserts and updates due to necessary index updates. You can read upon SQL Server indexing here. Get your hands on the SQL Server Profiler and check which are the slowest queries and why. Proper indexes can get you huge performance gains (e.g. an index on your GroupId will cut down the lookup time from a full table scan O(n) to an index lookup of O(n/25), assuming there are 25 people per group).

More than often, people write suboptimal SQL (returning unnecessary columns, Select N+1, cartesian joins). You should check that too.

Before implementing a cache, I would make sure your database is really the culprit for your performance problems. Premature Optimization is the root of all evil, caching is hard to do right. Caching frequently changing data is not what caching is intended for.

Johannes Rudolph 2010-04-10 12:04:30

I will have to throughly look more at my queries to make sure everything is going well I guess..

Scott 2010-04-12 18:21:17

Answer 3

+3 A:

In general, the reason for caching is that you feel you can pull the data out of memory (without it being stale) faster than you can pull it from the database. A situation where you can pull the right data from Cache is a Cache Hit. If your schema has a low Cache Hit rate, then Cache is probably hurting more than helping. If your data changes rapidly, you will have a low Cache Hit rate and it will be slower than simply querying for the data.

The trick is to split your data between infrequently changing and frequently changing elements. Cache the infrequently changing elements and do not cache the frequently changing elements. This could even be done at the database level on a single entity by using a 1:1 relationship where one of the tables contains the infrequently changing data and other the frequently changing information.You said that your source data would contain 10 columns that almost never change and 90 that change frequently. Build your objects around that notion so that you can cache the 10 that rarely change and query for the 90 that change frequently.

I store each row in a class and the class is stored in the Server Cache via a HUGE list

From your original post, it sounds like you are not storing each instance in cache, but instead a list of instances in cache as a single entry. The problem is that you can get multi-threading issues in this design. When multiple threads pull the one-list-to-rule-them-all, they are all accessing the same instance in memory (assuming they are on the same server). Furthermore, as you have discovered, the CacheDependency will not work in this design because it will expire the entire list rather than a single item.

One obvious, but highly problematic, solution would be to change your design to store each instance in memory with a logical cache key of some sort and add a CacheDependency for each instance. The problem is that if the number of instances is large, that will create a lot of overhead in the system verifying currency of each of the instances and expiring when necessary. If the cache items are polling the database, that will also create a lot of traffic.

An approach I have used to solve the problem of having a large number of database dependent CacheDependencies is to make a custom ICacheItemExpiration in the CachingBlock from the Enterprise Library. This also meant I was using the CachingBlock to do caching of my objects and not the ASP.NET cache directly. In this variant, I created a class called a DatabaseExpirationManager which kept track of which items to expire from cache. I would still add each item to the cache individually but with but with this modified dependency which simply registered the item with the DatabaseExpirationManager. The DatabaseExpirationManager would be notified of the keys that need to be expired and would expire the items from cache. I will say, right from the start, that this solution will probably not work on rapidly changing data. DatabaseExpirationManager would be running constantly holding a lock on its list of items to expire and preventing new items from being added. You would have to do some serious multi-threading analysis to ensure that you reduced contention while not enabling a race condition.

ADDITION

Ok. First, fair warning that this will be a long post. Second, this is not even the entire library as that would be too long.

Taking the wayback machine, I wrote this code in early and late-2005/early-2006 right as .NET 2.0 came out and I haven't investigated whether the more recent libraries might be doing this better (almost assuredly they are). I was using the January 2005/May 2005/January 2006 libraries. You can still get the 2006 library off CodePlex.

The way I came up with this solution was to look at the source of the Caching system in the Enterprise Library. In short, everything fed through the CacheManager class. That class has three primary components (all three are in the Microsoft.Practices.EnterpriseLibrary.Caching namespace): Cache BackgroundScheduler ExpirationPollTimer

The Cache class is the EntLib's implementation of cache. The BackgroundScheduler was used to scavenge the cache on a separate thread. The ExpirationPollTimer was a wrapper around a Timer class.

So, first off, it should be noted that the Cache scavenges itself based on a timer. Similarly, my solution would poll the database on a timer. The EntLib cache and the ASP.NET cache both work on the individual items having a delegate to check when the item should be expired. My solution worked on the premise of an outside entity checking when the items should be expired. The second thing to note is that whenever you start playing around with a central cache, you have to be attentive to multi-threading issues.

First I replaced the BackgroundScheduler with two classes: DatabaseExpirationWorker and DatabaseExpirationManager. DatabaseExpirationManager contained the important method that queried the database for changes and passed the list of changes to an event:

private object _syncRoot = new object();
private List<Guid>  _objectChanges = new List<Guid>();
public event EventHandler<DatabaseExpirationEventArgs> ExpirationFired;
...
public void UpdateExpirations()
{
    lock ( _syncRoot )
    {
        DataTable dt = GetExpirationsFromDb();
        List<Guid> keys = new List<Guid>();
        foreach ( DataRow dr in dt.Rows )
        {
            Guid key = (Guid)dr[0];
            keys.Add(key);
            _objectChanges.Add(key);
        }

        if ( ExpirationFired != null )
            ExpirationFired(this, new DatabaseExpirationEventArgs(keys));
    }
}

The DatabaseExpirationEventArgs class looked like so:

public class DatabaseExpirationEventArgs : System.EventArgs
{
    public DatabaseExpirationEventArgs( List<Guid> expiredKeys )
    {
        _expiredKeys = expiredKeys;
    }

    private List<Guid> _expiredKeys;
    public List<Guid> ExpiredKeys
    {
        get  {  return _expiredKeys;  }
    }
}

In this database, all the primary keys were Guids. This make keeping track of changes substantially simpler. Each of the save methods in the middle tier would write their PK and the current datetime into a table. Each time the system polled the database, it stored the datetime (from the database. not from the middle-tier) that it initiated the polling and GetExpirationsFromDb would return all items that had changed since that time. Another method would periodically remove rows that had long since been polled. This table of changes was very narrow: a guid and a datetime (with a PK on both columns and the clustered index on datetime IIRC). Thus, it could be queried very quickly. Also note that I used the Guid as the key in the Cache.

The DatabaseExpirationWorker class was nearly identical to the BackgroundScheduler except that its DoExpirationTimeoutExpired would call the DatabaseExpirationManager UpdateExpirations method. Since none of the methods in BackgroundScheduler were virtual, I could not simply derive from BackgroundScheduler and override its methods.

The last thing I did was to write my own version of the EntLib's CacheManager that used my DatabaseExpirationWorker instead of the BackgroundScheduler and its indexer would check the object expiration list:

private List<Guid> _objectExpirations;
private void OnExpirationFired( object sender, DatabaseExpirationEventArgs e )
{
    _objectExpirations = e.ExpiredKeys;
    lock(_objectExpirations)
    {
        foreach( Guid key in _objectExpirations)
            this.RealCache.Remove(key);
    }
}

private Microsoft.Practices.EnterpriseLibrary.Caching.CacheManager _realCache;
private Microsoft.Practices.EnterpriseLibrary.Caching.CacheManager RealCache
{
    get
    {
        lock(_syncRoot)    
        {       
            if ( _realCache == null )
                _realCache = Microsoft.Practices.EnterpriseLibrary.Caching.CacheManager.CacheFactory.GetCacheManager();

            return _realCache;
        }
    }
}


public object this[string key]
{
    get
    {
        lock(_objectExpirations)
        {
            if (_objectExpirations.Contains(key))
                return null;
            return this.RealCache.GetData(key);
        }
    }
}

Again, it's many moons since I reviewed this code but this gives you the jist of it. Even looking through my old code, I see many places that can be cleaned up and cleared up. I also have not looked at the Caching block in the most recent version of the EntLib but I would imagine it has changed and improved. Keep in mind that in the system in which I built this, there were dozens of changes per second not hundreds. So, if the data was stale for a minute or two, that was acceptable. If in your solution there thousands of changes per second then this solution may not feasible.

Thomas 2010-04-11 22:20:12

I think that your right about the multithreading issue. What would you think about putting the data in an observable collection? The collection can get notified when ever something changes and will/could notify the DB of the change. So I could update the collection first and then the DB. This would be a small problem because if the collection gets updated but by some error the DB does not could cause a slight problem so maybe its not the best idea.

Scott 2010-04-12 18:20:36

@Scott: I would still think the simpler solution is have the db direct changes to cache rather than the application or at least have a single controller of cache. On that custom built controller, you could enable the ability to force an expiration as if the controller had received instructions from the database that it had changed. The problem is the volume of queries to retrieve fresh data. If there are lots of changes, then you will have lots of requests to get fresh data because you will have lots of cache expirations.

Thomas 2010-04-12 18:54:55

So I can use SQLCacheDependency to cache the entire row and not the group? That way the DB expires items in the cache.. Is that what your saying? I am thinking that might be best..

Scott 2010-04-12 19:35:13

@Scott - In my custom version where I created a DatabaseExpirationManager, it acted as a controller to manage expirations. If you try to create a SqlDependency on each row and you have a lot of items, it will create a storm of database traffic as the dependencies check for changes. Having a controller that polled for changes and then updated the cache allowed for significantly few database calls. However, if you have a low number of items that you are putting into cache, then having a SqlDependency on each item might be better and certainly would require less custom code.

Thomas 2010-04-12 19:47:13

That ExpirationCache manager does make decent sense. Its something that I will have to look into and see what I can do. Ill let you know what I find out tonight while make the thing work. Thanks for the information.

Scott 2010-04-12 20:10:00

Also, Any chance you could share some code of point me in the direction of where you learned about creating your custom DatabaseExpirationManager?

Scott 2010-04-12 20:10:59

@Scott - I'll see if I can dig it up tonight. That was on a project from five or six years ago.

Thomas 2010-04-12 21:02:11

Thanks Thomas!!

Scott 2010-04-13 03:03:38

Thanks Thomas, Im going to look through it tonight and Ill get back with you. I do really appreciate it.

Scott 2010-04-14 14:37:50

Answer 4

+1 A:

I´m not so sure this is a good idea, you would probably have a better solution if you could manage to speed up communication with your database.

Hopefully I understood your requirements.
It fast became a lot of code, here you have it...

This is just a sample but it might be some thing to build on. I have not taken into consideration your need to remove rows after a certain amount of time.
I separated the cache into segments with groups where the groups contains rows.
I designed the sample to only lock a row when the first set property is called, when only get operations are called you should be safe.
The lock will be released when the row object is disposed. So you have to use using() or call Dispose() to make it work.

Here is a cache(group) class and a row class.
Add database read after the comment// Add code to read from database...

public class GroupCache : SimpleCache<RowObject, int>
{
    private static readonly object GroupCacheObjectLock = new object();

    public GroupCache(int groupId)
    {
        GroupId = groupId;
    }
    public int GroupId { get; private set; }

    public static GroupCache GetGroupCache(int groupId)
    {
        lock (GroupCacheObjectLock)
        {
            if (HttpContext.Current.Cache["Group-" + groupId] == null)
            {
                HttpContext.Current.Cache["Group-" + groupId] 
                    = new GroupCache(groupId);
            }
        }
        return HttpContext.Current.Cache["Group-" + groupId];
    }

    public override RowObject CreateItem(int id, 
            SimpleCache<RowObject, int> cache)
    {
        return new RowObject(id, GroupId, this);
    }

}

public class RowObject : SimpleCacheItem<RowObject, int>
{
    private string _property1;

    public RowObject(int rowId, int groupId, SimpleCache<RowObject, int> cache)
        : base(rowId, cache)
    {
        // Add code to read from database...
    }

    public string Property1
    {
        get { return _property1; }
        set
        {
            if (!AcquireLock(-1)) return;
            _property1 = value;
#if DEBUG
            Trace.WriteLine(string.Format("Thread id: {0}, value = {1}", 
                Thread.CurrentThread.ManagedThreadId, value));
#endif
        }
    }
}

This is a unit test mostly to show how to use the classes.

[TestFixture]
public class GroupCacheTest
{
    private int _threadFinishedCount;
    private void MultiThreadTestWorker(object obj)
    {
        for (int n = 0; n < 10; n++)
        {
            for (int m = 0; m < 25; m++)
            {
                using (RowObject row 
                    = GroupCache.GetGroupCache(n).GetCachedItem(m))
                {
                    row.Property1 = string.Format("{0} {1} {2}", obj, n, m);
                    Thread.Sleep(3);
                }
            }
        }
        Interlocked.Increment(ref _threadFinishedCount);
    }
    [Test]
    public void MultiThreadTest()
    {
        _threadFinishedCount = 1;
        for (int i = 0; i < 20; i++)
        {
            ThreadPool.QueueUserWorkItem(MultiThreadTestWorker, "Test-" + i);
        }
        while (_threadFinishedCount < 10)
            Thread.Sleep(100);
    }
}

Here are the base classes.

public abstract class SimpleCacheItem<T, TKey> : IDisposable where T : class
{
    private readonly SimpleCache<T, TKey> _cache;

    protected SimpleCacheItem(TKey id, SimpleCache<T, TKey> cache)
    {
        Id = id;
        _cache = cache;
    }

    protected TKey Id { get; private set; }

    #region IDisposable Members

    public virtual void Dispose()
    {
        if (_cache == null) return;
        _cache.ReleaseLock(Id);
    }

    #endregion

    protected bool AcquireLock(int timeout)
    {
        return _cache.AcquireLock(Id, -1);
    }
}

public abstract class SimpleCache<T, TKey> where T : class
{
    private static readonly object CacheItemLockSyncLock = new object();
    private static readonly object CacheItemStoreSyncLock = new object();
    private readonly Dictionary<TKey, int> _cacheItemLock;
    private readonly Dictionary<TKey, T> _cacheItemStore;

    public abstract T CreateItem(TKey id, SimpleCache<T, TKey> cache);

    public T GetCachedItem(TKey id)
    {
        T item;
        lock (CacheItemStoreSyncLock)
        {
            if (!_cacheItemStore.TryGetValue(id, out item))
            {
                item = CreateItem(id, this);
                _cacheItemStore.Add(id, item);
            }
        }
        return item;
    }

    public void ReleaseLock(TKey id)
    {
        lock (CacheItemLockSyncLock)
        {
            if (_cacheItemLock.ContainsKey(id))
            {
                _cacheItemLock.Remove(id);
            }
        }
#if DEBUG
        Trace.WriteLine(string.Format("Thread id: {0} lock released", 
        Thread.CurrentThread.ManagedThreadId));
#endif
    }

    public bool AcquireLock(TKey id, int timeOut)
    {
        var timer = new Stopwatch();
        timer.Start();
        while (timeOut < 0 || timeOut < timer.ElapsedMilliseconds)
        {
            lock (CacheItemLockSyncLock)
            {
                int threadId;
                if (!_cacheItemLock.TryGetValue(id, out threadId))
                {
                    _cacheItemLock.Add(id, 
                        Thread.CurrentThread.ManagedThreadId);
#if DEBUG
                    Trace.WriteLine(string.Format(
                        "Thread id: {0}, lock acquired after {1} ms", 
                        Thread.CurrentThread.ManagedThreadId, 
                        timer.ElapsedMilliseconds));
#endif
                    return true;
                }
                if (threadId == Thread.CurrentThread.ManagedThreadId) 
                    return true;
            }
            Thread.Sleep(15);
        }
        return false;
    }
}

Jens Granlund 2010-04-14 23:08:49

ansaurus

tags:

views:

answers:

How should I store Dynamically Changing Data into Server Cache?

related questions