views:

58

answers:

4

Hi,

I'd like to ask your expert advice on a workable architecture in C#.

I have a C# service which responds to a request from a local user on the LAN, fetches packets of data from the internet, and crunches that data to produce arrays of data in a structure. Each data request takes about 2 seconds, and returns 4000 bytes. There could be tens of thousands of requests per day.

To speed everything up, and reduce bandwidth, I need to cache the results of the data crunching so that 2nd and subsequent accesses are served instantly to any other users on the LAN (there could be >50 users).

Constraints:

  1. The underlying data never changes, i.e. I don't have to worry about "dirty" data (great!).
  2. The data I want to cache is a rather complex structure, containing nested arrays of DateTime, doubles, etc. The data is crunched, using a lot of math, from the data served from the internet.
  3. I can't use more than 100MB of memory no matter how much data is cached (i.e. the cache must be size limited).
  4. I can't index the data in the cache by a numerical index, I have to index it with a combination of date ("YYYY-MM-DD") and a unique ID string ("XXXXXXXX").
  5. It has to be fast, i.e. it has to serve most of its responses from RAM.
  6. The data in the cache must be persisted to disk every 24 hours.

Here are my options at the moment:

  1. Cache the data in the server class, using private variables (i.e. private List or Dictionary), then serialize it to disk occasionally;
  2. Use a database;

I'm interested in your expert opinion.

A: 

What about: Use the IIS provided internal methods?

TomTom
Sorry, my question was not clear enough: I have to cache local data, contained in a structure in a service class written in C#. I've edged the question to make this clearer.
Gravitas
+1  A: 

Perhaps something like Index4Objects?

(http://www.codeplex.com/i4o) http://staxmanade.blogspot.com/2008/12/i4o-indexspecification-for.html

Also, maybe read this response to another SO question http://stackoverflow.com/questions/601136/i4o-vs-plinq.

MattC
+1  A: 

By far the easiest solution is to use a Dictionary<string, ComplexDataStructure> for this.

Concerning your requirements:

  1. Lifetime of the cache is easiest to manage by having a background thread that does a scan of the cache ever 10 minutes or hour or so. With the ComplexDataStructure, you store a DateTime when the cache was created and remove the key from the dictionary once its lifetime has expired;

  2. Because you are storing the actual data structure, complexity is not an issue;

  3. Limiting the size may be difficult. http://stackoverflow.com/questions/26570/sizeof-equivalent-for-reference-types may help you to calculate the size of the object structure. This operation will not be trivial, but you can store the result with ComplexDataStructure. Then, the same thread as the one used for 1. can remove entries when you run out of space. An easier solution would probably be to use GC.GetTotalMemory() and determine whether the total memory usage of your process is outside of a specific limit. Then, just remove a cache item and on the second run, when you see you're still using too much memory, remove a second one;

  4. Just use a string;

  5. Using the Dictionary<,> is probably by fat the fasted way;

  6. Again, use the thread from 1. and implement such logic.

Make sure you handle your locking strategy correctly. The largest issue here will be that you don't want the crunching when a different thread is already crunching the data. A solution to this could be the following strategy:

  1. Lock the dictionary;

  2. Verify whether the cache item exists;

  3. When the cache item does not exist:

    1. Create an empty cache item;

    2. Add that to the dictionary;

    3. Put a lock on the cache item;

    4. Release the lock on the dictionary;

    5. Do the data crunching;

    6. Add the crunched data to the cache item;

    7. Release the lock on the cache item;

  4. When the cache item already exists;

    1. When the cache item actually does have the crunched data, return that;

    2. When the cache item does not have the crunched data, put a lock on the cache item;

    3. Inside the lock, the crunched data will have appeared (because the lock forces you to wait on the other thread).

There are other issues that will have to be resolved, but I think the basics are described here.

Pieter
Great answer, thanks. Rather than implement this myself, I ended up just using an off-the-shelf solution, .NET caching library from Kellerman.
Gravitas
You're welcome.
Pieter