views:

107

answers:

2

I need some advice on what kind of pattern(s) I should use for pushing/pulling data into my application.

I'm writing a rule-engine that needs to hold quite a large amount of data in-memory in order to be efficient enough. I have some rather conflicting requirements;

  1. It is not acceptable for the engine to always have to wait for a full pre-load of all data before it is functional.
  2. Only fetching and caching data on-demand will lead to the engine taking too long before it is running quickly enough.
  3. An external event can trigger the need for specific parts of the data to be reloaded.

Basically, I think I need a combination of pushing and pulling data into the application.

A simplified version of my current "pattern" looks like this (in psuedo-C# written in notepad):

// This interface is implemented by all classes that needs the data
interface IDataSubscriber 
{
    void RegisterData(Entity data);
}

// This interface is implemented by the data access class
interface IDataProvider
{
    void EnsureLoaded(Key dataKey);
    void RegisterSubscriber(IDataSubscriber subscriber);
}


class MyClassThatNeedsData : IDataSubscriber
{
    IDataProvider _provider;

    MyClassThatNeedsData(IDataProvider provider) 
    {
        _provider = provider;
        _provider.RegisterSubscriber(this);
    }

    public void RegisterData(Entity data) 
    {
        // Save data for later
        StoreDataInCache(data);
    }

    void UseData(Key key)
    {
        // Make sure that the data has been stored in cache
        _provider.EnsureLoaded(key);

        Entity data = GetDataFromCache(key);
    }
}

class MyDataProvider : IDataProvider
{
    List<IDataSubscriber> _subscribers;

    // Make sure that the data for key has been loaded to all subscribers
    public void EnsureLoaded(Key key)
    {
        if (HasKeyBeenMarkedAsLoaded(key))
            return;

        PublishDataToSubscribers(key);

        MarkKeyAsLoaded(key);
    }

    // Force all subscribers to get a new version of the data for key
    public void ForceReload(Key key)
    {
        PublishDataToSubscribers(key);

        MarkKeyAsLoaded(key);
    }

    void PublishDataToSubscribers(Key key)
    {
        Entity data = FetchDataFromStore(key);

        foreach(var subscriber in _subscribers)
        {
            subscriber.RegisterData(data);
        }
    }
}

// This class will be spun off on startup and should make sure that all data is 
// preloaded as quickly as possible
class MyPreloadingThread 
{
    IDataProvider _provider;

    MyPreloadingThread(IDataProvider provider)
    {
        _provider = provider;
    }

    void RunInBackground()
    {
        IEnumerable<Key> allKeys = GetAllKeys();

        foreach(var key in allKeys) 
        {
            _provider.EnsureLoaded(key);
        }
    }
}

I have a feeling though that this is not necessarily the best way of doing this.. Just the fact that explaining it seems to take two pages feels like an indication..

Any ideas? Any patterns out there I should have a look at?

A: 

You can start with something simple, like a solution based on Gateway pattern. Then you can try to enhance performance by adding Cash.

Roman
Regarding the gateway pattern I'd have to agree with Fowler (citing the link you gave) "The answer is so common that it's hardly worth stating.". And as I said in the question, just caching on-demand won't do the trick here as that would make the engine too slow for too long.
CodingInsomnia
The problem "I have a huge amount of data. How to process it fast?" is so common that it's hardly worth to expect detailed answers.
Roman
+1  A: 

Unambiguously, it should be

  • one of concurrency patterns (active object, for example)
  • producer-consumer pattern (queue)
  • lazy load (data on demand)
  • lazy unload pattern
  • strategy pattern (to implement data access algorithms)
  • multithreading access protected resource (cache)

My vote - active object with shared queue (bus) + lazy patterns + cache

igor
Interesting, I like the way the Active Object pattern deals with concurrency.
CodingInsomnia
so do I. I think, it is great for a lot of tasks: it is a difficult work to debug multithreading applications and there are many applications that don't support concurrent access. So AO is a good enough way to add controlled access to shared resources and to manage tasks by using priorities.
igor
Well, thanks for pointing me in an interesting direction!
CodingInsomnia