views:

208

answers:

4

Hey all,

EDIT: Purpose of this Website: Its called Utopiapimp.com. It is a third party utility for a game called utopia-game.com. The site currently has over 12k users to it an I run the site. The game is fully text based and will always remain that. Users copy and paste full pages of text from the game and paste the copied information into my site. I run a series of regular expressions against the pasted data and break it down. I then insert anywhere from 5 values to over 30 values into the DB based on that one paste. I then take those values and run queries against them to display the information back in a VERY simple and easy to understand way. The game is team based and each team has 25 users to it. So each team is a group and each row is ONE users information. The users can update all 25 rows or just one row at a time. I require storing things into cache because the site is very slow doing over 1,000 queries almost every minute.

So here is the deal. Imagine I have an excel EDIT(Excel is just an example of how to imagine it, I don't actually use excel) spreadsheet with 100 columns and 5000 rows. Each row has two unique identifiers. One for the row it self and one to group together 25 rows a piece. There are about 10 columns in the row that will almost never change and the other 90 columns will always be changing. We can say some will even change in a matter of seconds depending on how fast the row is updated. Rows can also be added and deleted from the group, but not from the database. The rows are taken from about 4 queries from the database to show the most recent and updated data from the database. So every time something in the database is updated, I would also like the row to be updated. If a row or a group has not been updated in 12 or so hours, it will be taken out of Cache. Once the user calls the group again via the DB queries. They will be placed into Cache.

The above is what I would like. That is the wish.

In Reality, I still have all the rows, but the way I store them in Cache is currently broken. I store each row in a class and the class is stored in the Server Cache via a HUGE list. When I go to update/Delete/Insert items in the list or rows, most the time it works, but sometimes it throws errors because the cache has changed. I want to be able to lock down the cache like the database throws a lock on a row more or less. I have DateTime stamps to remove things after 12 hours, but this almost always breaks because other users are updating the same 25 rows in the group or just the cache has changed.

This is an example of how I add items to Cache, this one shows I only pull the 10 or so columns that very rarely change. This example all removes rows not updated after 12 hours:

DateTime dt = DateTime.UtcNow;
    if (HttpContext.Current.Cache["GetRows"] != null)
    {
        List<RowIdentifiers> pis = (List<RowIdentifiers>)HttpContext.Current.Cache["GetRows"];
        var ch = (from xx in pis
                  where xx.groupID == groupID 
                  where xx.rowID== rowID
                  select xx).ToList();
        if (ch.Count() == 0)
        {
            var ck = GetInGroupNotCached(rowID, groupID, dt); //Pulling the group from the DB
            for (int i = 0; i < ck.Count(); i++)
                pis.Add(ck[i]);
            pis.RemoveAll((x) => x.updateDateTime < dt.AddHours(-12));
            HttpContext.Current.Cache["GetRows"] = pis;
            return ck;
        }
        else
            return ch;
    }
    else
    {
        var pis = GetInGroupNotCached(rowID, groupID, dt);//Pulling the group from the DB
        HttpContext.Current.Cache["GetRows"] = pis;
        return pis;
    }

On the last point, I remove items from the cache, so the cache doesn't actually get huge.

To re-post the question, Whats a better way of doing this? Maybe and how to put locks on the cache? Can I get better than this? I just want it to stop breaking when removing or adding rows.

EDIT: The code SQLCacheDependency does NOT work for LINQ as posted in the comments of Remus. It works for a full table select, but I want to select just certain columns from the rows. I don't want to select Entire Rows, so I cannot use Remus's Idea.

Neither of the following code samples work.

var ck = (from xx in db.GetInGroupNotCached
              where xx.rowID== rowID
              select new {                 
                  xx.Item,
                  xx.AnotherItem,
                  xx.AnotherItem,
                  }).CacheSql(db, "Item:" + rowID.ToString()).ToList();


var ck = (from xx in db.GetInGroupNotCached
              where xx.rowID== rowID
              select new ClassExample {              
                Item=  xx.Item,
                 AnotherItem= xx.AnotherItem,
                 AnotherItemm = xx.AnotherItemm,
                  }).CacheSql(db, "Item:" + rowID.ToString()).ToList();
+2  A: 

You are storing the entire database in memory, as a list, and re-query it for every request as a list traversal. Frankly I doubt this 'cache' is in fact faster than just running a SQL query. Traversing list is never going to beat a database...

What you should do instead is cache specific query results. Like the result set for a rowID and groupID, cached keyed by the two arguments. For refresh, rely on the built-in cache invalidation infrastructure around Query Notifications, see this article The Mysterious Notification to understand how that works. With an ASP.Net project, all you have to do is leverage the SqlCacheDependency.

Remus Rusanu
Your saying that the SQLCacheDependency item will clear up the errors. It looks like it sends a notification of some sort. Can this keep up with over 1000 rows changes a minute?
Scott
No, 100 row changes a minute is not something you cache. 100 changes a minute you query the database each time, since it likely changed.
Remus Rusanu
Then I can't use SQLCacheDependency... If I removed the cached lists I have already implemented, then the execution and results slows down considerably. I need to use something else. Also, it seems that Linq is not able to handle the SQLCacheDependency, so there is another problem.
Scott
Execution and result slow down means you have a bad schema design in the database. LinqToSql with SqlDependency work just fine, see http://code.msdn.microsoft.com/linqtosqlcache or http://dunnry.com/blog/UsingSQLDependencyObjectsWithLINQ.aspx. Right now you are down the path of implementing a database in memory using lists and you are just discovering that is not really that easy. Will get down right impossible when you'll scale to two WWW servers. You can fix your DB design now, or you'll have to do it later.
Remus Rusanu
I honestly don't think I have a bad DB design. I have designed it really well. Ill look more into this and get back to you on it. Let me see what I can find out.
Scott
The 'game' is 100 columns X 5000 rows accessed by 12k users? Ie. each HTTP hit traverses the in-memory 500k list end-to-end at least once?
Remus Rusanu
It traverses, yes... I have a LINQ query that pulls the correct data to be updated/used in the list.
Scott
It doesn't work. I made an Edit in the above stating that the two links you posted wont work for anonymous types. Im still stuck as to what to do. Think im gonna start a bounty on this item.
Scott
SqlCacheDependency is for Tables. SqlDependency is for arbitrary queries.
Remus Rusanu
So what your saying is that if I use SQLDependency, it should work? Does it still cache the results and make the query faster?
Scott
SQLDependency still relies on the table and can't have anonymous types... Just checked..
Scott
@Scott - I think what Remus Rusanu is getting at is that your approach to caching rapidly changing data is flawed. Even if you come up with the perfect means to expire a Cache element, if the data is changing as rapidly as you suggest, then you will get almost no Cache hits because the Cache will be expiring almost as fast as it is put into Cache.
Thomas
+3  A: 

I really doubt your caching solution is actually of any use. List<T> cannot be indexed, a lookup in your list is therefore always a O(n) operation.

Assuming you have profiled your application and know the database is your bottleneck, this is what you can do:

In a database you can create indexes on your data, a lookup on them will exhibit O(log(n)) typically. You should create coverage indexes for queries that include your static data. Leave the frequently changing data non-indexed, because this would slow down inserts and updates due to necessary index updates. You can read upon SQL Server indexing here. Get your hands on the SQL Server Profiler and check which are the slowest queries and why. Proper indexes can get you huge performance gains (e.g. an index on your GroupId will cut down the lookup time from a full table scan O(n) to an index lookup of O(n/25), assuming there are 25 people per group).

More than often, people write suboptimal SQL (returning unnecessary columns, Select N+1, cartesian joins). You should check that too.

Before implementing a cache, I would make sure your database is really the culprit for your performance problems. Premature Optimization is the root of all evil, caching is hard to do right. Caching frequently changing data is not what caching is intended for.

Johannes Rudolph
I will have to throughly look more at my queries to make sure everything is going well I guess..
Scott
+3  A: 

In general, the reason for caching is that you feel you can pull the data out of memory (without it being stale) faster than you can pull it from the database. A situation where you can pull the right data from Cache is a Cache Hit. If your schema has a low Cache Hit rate, then Cache is probably hurting more than helping. If your data changes rapidly, you will have a low Cache Hit rate and it will be slower than simply querying for the data.

The trick is to split your data between infrequently changing and frequently changing elements. Cache the infrequently changing elements and do not cache the frequently changing elements. This could even be done at the database level on a single entity by using a 1:1 relationship where one of the tables contains the infrequently changing data and other the frequently changing information.You said that your source data would contain 10 columns that almost never change and 90 that change frequently. Build your objects around that notion so that you can cache the 10 that rarely change and query for the 90 that change frequently.

I store each row in a class and the class is stored in the Server Cache via a HUGE list

From your original post, it sounds like you are not storing each instance in cache, but instead a list of instances in cache as a single entry. The problem is that you can get multi-threading issues in this design. When multiple threads pull the one-list-to-rule-them-all, they are all accessing the same instance in memory (assuming they are on the same server). Furthermore, as you have discovered, the CacheDependency will not work in this design because it will expire the entire list rather than a single item.

One obvious, but highly problematic, solution would be to change your design to store each instance in memory with a logical cache key of some sort and add a CacheDependency for each instance. The problem is that if the number of instances is large, that will create a lot of overhead in the system verifying currency of each of the instances and expiring when necessary. If the cache items are polling the database, that will also create a lot of traffic.

An approach I have used to solve the problem of having a large number of database dependent CacheDependencies is to make a custom ICacheItemExpiration in the CachingBlock from the Enterprise Library. This also meant I was using the CachingBlock to do caching of my objects and not the ASP.NET cache directly. In this variant, I created a class called a DatabaseExpirationManager which kept track of which items to expire from cache. I would still add each item to the cache individually but with but with this modified dependency which simply registered the item with the DatabaseExpirationManager. The DatabaseExpirationManager would be notified of the keys that need to be expired and would expire the items from cache. I will say, right from the start, that this solution will probably not work on rapidly changing data. DatabaseExpirationManager would be running constantly holding a lock on its list of items to expire and preventing new items from being added. You would have to do some serious multi-threading analysis to ensure that you reduced contention while not enabling a race condition.

ADDITION

Ok. First, fair warning that this will be a long post. Second, this is not even the entire library as that would be too long.

Taking the wayback machine, I wrote this code in early and late-2005/early-2006 right as .NET 2.0 came out and I haven't investigated whether the more recent libraries might be doing this better (almost assuredly they are). I was using the January 2005/May 2005/January 2006 libraries. You can still get the 2006 library off CodePlex.

The way I came up with this solution was to look at the source of the Caching system in the Enterprise Library. In short, everything fed through the CacheManager class. That class has three primary components (all three are in the Microsoft.Practices.EnterpriseLibrary.Caching namespace): Cache BackgroundScheduler ExpirationPollTimer

The Cache class is the EntLib's implementation of cache. The BackgroundScheduler was used to scavenge the cache on a separate thread. The ExpirationPollTimer was a wrapper around a Timer class.

So, first off, it should be noted that the Cache scavenges itself based on a timer. Similarly, my solution would poll the database on a timer. The EntLib cache and the ASP.NET cache both work on the individual items having a delegate to check when the item should be expired. My solution worked on the premise of an outside entity checking when the items should be expired. The second thing to note is that whenever you start playing around with a central cache, you have to be attentive to multi-threading issues.

First I replaced the BackgroundScheduler with two classes: DatabaseExpirationWorker and DatabaseExpirationManager. DatabaseExpirationManager contained the important method that queried the database for changes and passed the list of changes to an event:

private object _syncRoot = new object();
private List<Guid>  _objectChanges = new List<Guid>();
public event EventHandler<DatabaseExpirationEventArgs> ExpirationFired;
...
public void UpdateExpirations()
{
    lock ( _syncRoot )
    {
        DataTable dt = GetExpirationsFromDb();
        List<Guid> keys = new List<Guid>();
        foreach ( DataRow dr in dt.Rows )
        {
            Guid key = (Guid)dr[0];
            keys.Add(key);
            _objectChanges.Add(key);
        }

        if ( ExpirationFired != null )
            ExpirationFired(this, new DatabaseExpirationEventArgs(keys));
    }
}

The DatabaseExpirationEventArgs class looked like so:

public class DatabaseExpirationEventArgs : System.EventArgs
{
    public DatabaseExpirationEventArgs( List<Guid> expiredKeys )
    {
        _expiredKeys = expiredKeys;
    }

    private List<Guid> _expiredKeys;
    public List<Guid> ExpiredKeys
    {
        get  {  return _expiredKeys;  }
    }
}

In this database, all the primary keys were Guids. This make keeping track of changes substantially simpler. Each of the save methods in the middle tier would write their PK and the current datetime into a table. Each time the system polled the database, it stored the datetime (from the database. not from the middle-tier) that it initiated the polling and GetExpirationsFromDb would return all items that had changed since that time. Another method would periodically remove rows that had long since been polled. This table of changes was very narrow: a guid and a datetime (with a PK on both columns and the clustered index on datetime IIRC). Thus, it could be queried very quickly. Also note that I used the Guid as the key in the Cache.

The DatabaseExpirationWorker class was nearly identical to the BackgroundScheduler except that its DoExpirationTimeoutExpired would call the DatabaseExpirationManager UpdateExpirations method. Since none of the methods in BackgroundScheduler were virtual, I could not simply derive from BackgroundScheduler and override its methods.

The last thing I did was to write my own version of the EntLib's CacheManager that used my DatabaseExpirationWorker instead of the BackgroundScheduler and its indexer would check the object expiration list:

private List<Guid> _objectExpirations;
private void OnExpirationFired( object sender, DatabaseExpirationEventArgs e )
{
    _objectExpirations = e.ExpiredKeys;
    lock(_objectExpirations)
    {
        foreach( Guid key in _objectExpirations)
            this.RealCache.Remove(key);
    }
}

private Microsoft.Practices.EnterpriseLibrary.Caching.CacheManager _realCache;
private Microsoft.Practices.EnterpriseLibrary.Caching.CacheManager RealCache
{
    get
    {
        lock(_syncRoot)    
        {       
            if ( _realCache == null )
                _realCache = Microsoft.Practices.EnterpriseLibrary.Caching.CacheManager.CacheFactory.GetCacheManager();

            return _realCache;
        }
    }
}


public object this[string key]
{
    get
    {
        lock(_objectExpirations)
        {
            if (_objectExpirations.Contains(key))
                return null;
            return this.RealCache.GetData(key);
        }
    }
}

Again, it's many moons since I reviewed this code but this gives you the jist of it. Even looking through my old code, I see many places that can be cleaned up and cleared up. I also have not looked at the Caching block in the most recent version of the EntLib but I would imagine it has changed and improved. Keep in mind that in the system in which I built this, there were dozens of changes per second not hundreds. So, if the data was stale for a minute or two, that was acceptable. If in your solution there thousands of changes per second then this solution may not feasible.

Thomas
I think that your right about the multithreading issue. What would you think about putting the data in an observable collection? The collection can get notified when ever something changes and will/could notify the DB of the change. So I could update the collection first and then the DB. This would be a small problem because if the collection gets updated but by some error the DB does not could cause a slight problem so maybe its not the best idea.
Scott
@Scott: I would still think the simpler solution is have the db direct changes to cache rather than the application or at least have a single controller of cache. On that custom built controller, you could enable the ability to force an expiration as if the controller had received instructions from the database that it had changed. The problem is the volume of queries to retrieve fresh data. If there are lots of changes, then you will have lots of requests to get fresh data because you will have lots of cache expirations.
Thomas
So I can use SQLCacheDependency to cache the entire row and not the group? That way the DB expires items in the cache.. Is that what your saying? I am thinking that might be best..
Scott
@Scott - In my custom version where I created a DatabaseExpirationManager, it acted as a controller to manage expirations. If you try to create a SqlDependency on each row and you have a lot of items, it will create a storm of database traffic as the dependencies check for changes. Having a controller that polled for changes and then updated the cache allowed for significantly few database calls. However, if you have a low number of items that you are putting into cache, then having a SqlDependency on each item might be better and certainly would require less custom code.
Thomas
That ExpirationCache manager does make decent sense. Its something that I will have to look into and see what I can do. Ill let you know what I find out tonight while make the thing work. Thanks for the information.
Scott
Also, Any chance you could share some code of point me in the direction of where you learned about creating your custom DatabaseExpirationManager?
Scott
@Scott - I'll see if I can dig it up tonight. That was on a project from five or six years ago.
Thomas
Thanks Thomas!!
Scott
Thanks Thomas, Im going to look through it tonight and Ill get back with you. I do really appreciate it.
Scott
+1  A: 

I´m not so sure this is a good idea, you would probably have a better solution if you could manage to speed up communication with your database.

Hopefully I understood your requirements.
It fast became a lot of code, here you have it...

This is just a sample but it might be some thing to build on. I have not taken into consideration your need to remove rows after a certain amount of time.
I separated the cache into segments with groups where the groups contains rows.
I designed the sample to only lock a row when the first set property is called, when only get operations are called you should be safe.
The lock will be released when the row object is disposed. So you have to use using() or call Dispose() to make it work.

Here is a cache(group) class and a row class.
Add database read after the comment// Add code to read from database...

public class GroupCache : SimpleCache<RowObject, int>
{
    private static readonly object GroupCacheObjectLock = new object();

    public GroupCache(int groupId)
    {
        GroupId = groupId;
    }
    public int GroupId { get; private set; }

    public static GroupCache GetGroupCache(int groupId)
    {
        lock (GroupCacheObjectLock)
        {
            if (HttpContext.Current.Cache["Group-" + groupId] == null)
            {
                HttpContext.Current.Cache["Group-" + groupId] 
                    = new GroupCache(groupId);
            }
        }
        return HttpContext.Current.Cache["Group-" + groupId];
    }

    public override RowObject CreateItem(int id, 
            SimpleCache<RowObject, int> cache)
    {
        return new RowObject(id, GroupId, this);
    }

}

public class RowObject : SimpleCacheItem<RowObject, int>
{
    private string _property1;

    public RowObject(int rowId, int groupId, SimpleCache<RowObject, int> cache)
        : base(rowId, cache)
    {
        // Add code to read from database...
    }

    public string Property1
    {
        get { return _property1; }
        set
        {
            if (!AcquireLock(-1)) return;
            _property1 = value;
#if DEBUG
            Trace.WriteLine(string.Format("Thread id: {0}, value = {1}", 
                Thread.CurrentThread.ManagedThreadId, value));
#endif
        }
    }
}

This is a unit test mostly to show how to use the classes.

[TestFixture]
public class GroupCacheTest
{
    private int _threadFinishedCount;
    private void MultiThreadTestWorker(object obj)
    {
        for (int n = 0; n < 10; n++)
        {
            for (int m = 0; m < 25; m++)
            {
                using (RowObject row 
                    = GroupCache.GetGroupCache(n).GetCachedItem(m))
                {
                    row.Property1 = string.Format("{0} {1} {2}", obj, n, m);
                    Thread.Sleep(3);
                }
            }
        }
        Interlocked.Increment(ref _threadFinishedCount);
    }
    [Test]
    public void MultiThreadTest()
    {
        _threadFinishedCount = 1;
        for (int i = 0; i < 20; i++)
        {
            ThreadPool.QueueUserWorkItem(MultiThreadTestWorker, "Test-" + i);
        }
        while (_threadFinishedCount < 10)
            Thread.Sleep(100);
    }
}

Here are the base classes.

public abstract class SimpleCacheItem<T, TKey> : IDisposable where T : class
{
    private readonly SimpleCache<T, TKey> _cache;

    protected SimpleCacheItem(TKey id, SimpleCache<T, TKey> cache)
    {
        Id = id;
        _cache = cache;
    }

    protected TKey Id { get; private set; }

    #region IDisposable Members

    public virtual void Dispose()
    {
        if (_cache == null) return;
        _cache.ReleaseLock(Id);
    }

    #endregion

    protected bool AcquireLock(int timeout)
    {
        return _cache.AcquireLock(Id, -1);
    }
}

public abstract class SimpleCache<T, TKey> where T : class
{
    private static readonly object CacheItemLockSyncLock = new object();
    private static readonly object CacheItemStoreSyncLock = new object();
    private readonly Dictionary<TKey, int> _cacheItemLock;
    private readonly Dictionary<TKey, T> _cacheItemStore;

    public abstract T CreateItem(TKey id, SimpleCache<T, TKey> cache);

    public T GetCachedItem(TKey id)
    {
        T item;
        lock (CacheItemStoreSyncLock)
        {
            if (!_cacheItemStore.TryGetValue(id, out item))
            {
                item = CreateItem(id, this);
                _cacheItemStore.Add(id, item);
            }
        }
        return item;
    }

    public void ReleaseLock(TKey id)
    {
        lock (CacheItemLockSyncLock)
        {
            if (_cacheItemLock.ContainsKey(id))
            {
                _cacheItemLock.Remove(id);
            }
        }
#if DEBUG
        Trace.WriteLine(string.Format("Thread id: {0} lock released", 
        Thread.CurrentThread.ManagedThreadId));
#endif
    }

    public bool AcquireLock(TKey id, int timeOut)
    {
        var timer = new Stopwatch();
        timer.Start();
        while (timeOut < 0 || timeOut < timer.ElapsedMilliseconds)
        {
            lock (CacheItemLockSyncLock)
            {
                int threadId;
                if (!_cacheItemLock.TryGetValue(id, out threadId))
                {
                    _cacheItemLock.Add(id, 
                        Thread.CurrentThread.ManagedThreadId);
#if DEBUG
                    Trace.WriteLine(string.Format(
                        "Thread id: {0}, lock acquired after {1} ms", 
                        Thread.CurrentThread.ManagedThreadId, 
                        timer.ElapsedMilliseconds));
#endif
                    return true;
                }
                if (threadId == Thread.CurrentThread.ManagedThreadId) 
                    return true;
            }
            Thread.Sleep(15);
        }
        return false;
    }
}

Jens Granlund