views:

102

answers:

2

Hi All,

I'm working on an application that does processing at what I'd call fairly high throughput (current peaks in the range of 400 Mbps, design goal of eventual 10 Gbps).

I run multiple instances of a loop which basically just cycles through reading and processing information, and uses a dictionary for holding state. However, i also need to scan the entire dictionary periodically to check for timeouts, and I'd like to solicit some ideas on what to do if this scan becomes a performance hotspot. Basically, what I'm looking for, is if there are any standard techniques for interleaving the timeout checks on the dictionary, with the main processing code in the loop, so that say on loop 1 I check the first dictionary item, loop 2, the second, etc. Also, the dictionary keys change, and will be deleted and added in the main processing code, so it's not quite as simple as taking a copy of all the dictionary keys and then checking them one by one in the main loop.

I'll reiterate, this is not a current performance problem. Thus, Please no comments about premature optimizations, I realize it's premature, I am consciously making the choice to consider this a potential problem.

Edit for clarity: This is a curiosity for me that I'm thinking about it on my weekend, and what a best practices approach might be for something like this. This isn't the only problem I have, and not the only area of performance I'm looking at. However, this is one area where I'm not really aware of a clean concise way to approach this.

I'm already exploiting parallelism and hardware on this (the next level of hardware is a 5x increase in cost, but more significantly will require a redesign in the parallelism). The parallelism is also working the way I want it to, so again, please it isn't necessary to comment on this. The dictionary is instantiated per thread, so any additional threads for running the checks would require synchronization between the threads, which is too costly.

Some pseudo code of the logic if it helps:

Dictionary hashdb;
while(true) {
  grab_record_from_buffer(); // There is a buffer in place, so some delays are tolerable
  process(record);  //do the main processing
  update_hashdb();  //Add,remove,update entries in the dictionary
  if(last_scan > 15 seconds)
    foreach(entry in hashdb)
      periodic_check(entry);  //check for timeouts or any other periodic checks on every db entry
}

I do realize I may not run into an actual problem with the way I have it, so there's a good chance whatever comes up may not require use. However, what I'm really looking for is if there is any standard approach or algorithm for interleaving a dictionary scan with main processing logic, that I'm just not aware of (and the dictionary is changing). Or any suggestions on an approach to this (I do already have an idea how I would approach it, but it's not as clean as I'd like).

Thank You,

+2  A: 

Are you able to use .NET 4.0 (or at least plan to do so)? If so, ConcurrentDictionary may help you - it allows you to iterate over a dictionary while still modifying it (either in the same thread or a different one).

You need to be aware that the results may be surprising - you may see some changes but not others, for example - but if that's acceptable, it may be a useful approach.

You could then have one thread doing periodic checks for all the other dictionaries. I know you'd previously ruled this out due to synchronization requirements, but the beauty of ConcurrentDictionary is that it doesn't require synchronization1. Does that change the feasibility of using a separate checking thread?

If you don't want to use a separate thread you could use an iterator explicitly - each time you go through the loop, check another entry and start again if you've reached the end. Again, this wouldn't work with a standard dictionary, but should work for a ConcurrentDictionary - so long as you're willing to work with the possibility of seeing a mixture of updated and stale data.


1 ... by which I mean it doesn't require any explicit synchronization, and that the internal synchronization is significantly lighter-weight than having to take out a lock around every access.

From Stephen Toub's post on ConcurrentDictionary:

For modifications / writes to the dictionary, ConcurrentDictionary employs fine-grained locking to ensure thread-safety (reads on the dictionary are performed in a lock-free manner)

The other big reduction in locking is the ability mentioned above: you can iterate over the dictionary in one thread while modifying it in another, so long as you can cope with seeing some changes applied since the iterator was created but not others. Compare this with normal Dictionary<,> where for safe concurrent access you'd have to lock the dictionary for the entire time you were iterating over it.

Jon Skeet
-1, we went through this before. "doesn't require synchronization" is totally misleading as well, of course it uses locking. It just isn't visible. Invisible locks are not faster than visible ones.
Hans Passant
@nobugs: Yes, we went through this before. You claimed that MSDN was silent on iteration, and I showed you exactly where it describes the behaviour. Then, as now, you completely ignored the fact that I explicitly said it's only appropriate if you can handle that behaviour. While ConcurrentDictionary has some internal locking, it's *greatly reduced* compared to the explicit locking you'd have to perform to work with a non-concurrent dictionary. I'll edit my answer to make that clearer, although I very much doubt it'll satisfy you. Are you going to -1 every post mentioning ConcurrentDictionary?
Jon Skeet
@jon - I can't see your comments if you don't get my nick right.
Hans Passant
@nobugz: Apologies for that. Now, how about the actual points that I made?
Jon Skeet
@jon - sorry, I lost track of the other thread and can't find it back. The discussion was about how to handle an iteration randomly returning duplicate objects or missing objects without being able to find out about it. Toub doesn't touch on the subject, I'm curious what a proper way to handle this might look like.
Hans Passant
@nobugz: The other thread is at http://stackoverflow.com/questions/2347269 - Stephen's blog post doesn't cover that (at least that particular post doesn't) but how you would handle it would depend on the situation. I think so long as you're aware of it, you can just decide how you want to go on - for instance, you *might* want to check that the key/value pair is still there once you've found a potential match. In the case of this question, you'd probably be happy enough to miss out "new" additions - you'll catch them next time through anyway.
Jon Skeet
@jon You are correct, using the second thread is mainly around locking. Missing an update is ok, as it should be caught the next time around. 99% of operations on the dictionary are reads, so if that is done with lightweight techniques it might work.
Kevin Nisbet
accepted: I was trying to hold out for another option, but this seems to show the most promise.. Thanks.
Kevin Nisbet
+1  A: 

One idea here might be to add a level of indirection (some kind of proxy / wrapper object). This would allow you to update objects, and remove objects (by setting it to null) without breaking iterators (Add still remains a pain, though).

Obviously you'd want to add an occasional "proper" remove that sweeps for null, but this might work quite nicely with a ReaderWriterLockSlim - since "get", iterate", "update" and "remove via set-to-null" now only require a read lock, and only "add" and "cleanup removed keys" requires a write lock.

Note I'm using : class here to avoid having to add my own code to ensure atomicity:

static void Main()
{
    try
    {
        var direct = new Dictionary<string, string>();
        direct.Add("abc", "abc");
        direct.Add("def", "def");
        using (var iter = direct.GetEnumerator())
        {
            iter.MoveNext();
            Console.WriteLine(iter.Current.Value);
            direct["def"] = "DEF";
            iter.MoveNext();
            Console.WriteLine(iter.Current.Value);
        }
    }
    catch { Console.WriteLine("direct: BOOM"); }
    try
    {
        var indirect = new Dictionary<string, Wrapper<string>>();
        indirect.Add("abc", "abc");
        indirect.Add("def", "def");
        using (var iter = indirect.GetEnumerator())
        {
            iter.MoveNext();
            Console.WriteLine(iter.Current.Value);
            indirect["def"].Value = "DEF";
            iter.MoveNext();
            Console.WriteLine(iter.Current.Value);
        }
    } catch { Console.WriteLine("indirect: BOOM"); }
}
class Wrapper<T> where T : class {
    public T Value { get; set; }
    public static implicit operator Wrapper<T>(T value) {
        return new Wrapper<T> { Value = value};
    }
    public static implicit operator T (Wrapper<T> value) {
        return value.Value;
    }
    public override string ToString() {
        return Convert.ToString(Value);
    }
}
Marc Gravell