views:

208

answers:

4
+3  Q: 

Timer Efficiency

I am planning to develop a system with tens of thousands of objects in it, which will each have up to 42(but more likely down around 4 or 5) separate actions they will potentially be performing at regular intervals. I also plan to write code that will deactivate the timers until the object comes into use. When idle, the objects will only need 1 timer each, but when active, the other timers will all start at once. At first the number of objects will be small, maybe a few hundred, but I expect it to grow exponentially, and within a few months, start to reach up in the tens of thousands.

So, I am very worried about efficiency of the code I will be writing for the timers and for these objects. There are three levels in which I could write this application on that would all successfully perform the tasks required. Also, I plan to run this system on a Quad Core server, so I would like to make use of multi-threading wherever possible.

To this end, I've decided to use the System.Timers.Timer class which fires a new thread for each elapse event.

These are the 3 levels I am considering:

  1. One single timer operates the entire application, it iterates through each object, checks to see if any other actions need to be fired, and if so, runs them, then moves on to the next.

  2. Multi-tier timer where each object has a master timer that checks all of the functions the object could need to perform, runs any that are ready, and then sets the next timer interval to the next required action time.

  3. Recursive-tier timer where each action in each object has it's own timer that will be triggered, and then set to run the next time it will be available.

The problem with option 1 is that with so many objects and actions, one singular timer elapse in this manner could run for maybe 20+ seconds (while it executed a few million lines of looped code), where this should probably be ticking every 1 second. If the objects aren't kept in synch, the system would likely not work well.

The problem with option 2 is that it would be a little harder to write than option 3, but not by much, it would also mean perhaps 10,000+ maybe timers running on the system (one for each object), creating and destroying threads with each elapse like its nobody's business (which I'm not sure if this is a problem or not). Each timer would have to fire at least once per second in this situation, with perhaps a few hundred lines of code running (up to perhaps a thousand in an extreme case).

The problem with option 3 is the sheer amount of timers that could potentially be introduced into the system. I'm talking about an average of 10,000+ timers with the potential for near 100,000+ timers to be run at the same time. Each elapse event may only have to run 50 or less lines of code though, making them very short. The elapse events would have delays between a hundredth of a second on one extreme, and five minutes on the other, with the average likely being around 1 second.

I am proficient in Visual Basic .NET, and was planning to write it in that, but I could also revert to my high-school days and try to write this in C++ for efficiency if it would make that much of a difference (please let me know if you have any sources on code efficiency between languages). Also toying with the notion of running this on a clustered Linux server instead of my Quad Core Windows server, but I'm not sure if I could get any of my .NET apps to run on a linux cluster like that (would love any info on that as well).

The main question to answer for this topic is:

Do I use option 1, 2, or 3, and why?

~Edit after considering comments~

So the 4th option involving the timer wheel with a spinlock. Here is a job class:

Public Class Job
Private dFireTime As DateTime
Private objF As CrossAppDomainDelegate
Private objParams() As Object

Public Sub New(ByVal Func As CrossAppDomainDelegate, ByVal Params() As Object, ByVal FireTime As DateTime)
    objF = Func
    dFireTime = FireTime
    objParams = Params
End Sub

Public ReadOnly Property FireTime()
    Get
        Return dFireTime
    End Get
End Property

Public ReadOnly Property Func() As CrossAppDomainDelegate
    Get
        Return objF
    End Get
End Property

Public ReadOnly Property Params() As Object()
    Get
        Return objParams
    End Get
End Property
End Class

And then the main loop implementation:

Private Tasks As LinkedList(Of Job)

Private Sub RunTasks()
    While True
        Dim CurrentTime as DateTime = Datetime.Now            

        If Not Tasks.Count = 0 AndAlso Tasks(0).FireTime > CurrentTime Then
            Dim T As Job = Tasks(0)
            Tasks.RemoveFirst()
            T.Func.Invoke()
        Else
            Dim MillisecondDif As Double

            MillisecondDif = Tasks(0).FireTime.Subtract(CurrentTime).Milliseconds
            If MillisecondDif > 30 Then
                Threading.Thread.Sleep(MillisecondDif)
            End If
        End If

    End While
End Sub

Do I have it right?

EpicClanWars.com

~Edit 2~

Switched the word "Task" out for "Job" so ppl could stop complaining about it ;)

~Edit 3~

Added variables for tracking time & ensuring spinloops happen when needed

+3  A: 

None of the above. The standard solution is to keep a list of events, such that each one points to the next one to occur. You then use a single timer and have it wake up only in time for the next event.

edit

Looks like this is called a timer wheel.

edit

As Sentinel pointed out, events should be dispatched to a thread pool. The handler for these events should do a small bit of work as quickly as possible, and without blocking. If it needs to do I/O, it should fire off an async task and terminate. Otherwise, a thread pool would overflow.

The .NET 4.0 Task class might be helpful here, particularly for its continuation methods.

Steven Sudit
This does sound like a standard way this problem might be solved, do you have an example of a design pattern which shows this?
Jrud
It's my standard way too. No design pattern, it follows from simple logic. The only thing you have to time is the duration to the next event. That takes one timer. And a list of events sorted by time. The list can be arbitrarily large.
Hans Passant
Well, how do you guys think system timers work? They do exactly this. Determine when the next quickest event is due to expire, sleep for that time, wakeup and service the event, and then repeat.
Ziffusion
So what happens when over 1000 events have to be performed within the second? The timer class has a minimum elapse time of 1 millisecond. I can't do multiple queues easily either because events may have to be stuck in the order dynamically.
Jrud
@Hans: You're right that it wouldn't be called a design pattern. It's simply an algorithm, and a rather old and obvious one at that.
Steven Sudit
@Jrud: It's not even accurate to 1ms. On most systems, the resolution is to the nearest 16ms. There is such a thing as a high-resolution timer, but I believe it's in some way system-specific. See http://msdn.microsoft.com/en-us/library/ms644900(VS.85).aspx#high_resolution
Steven Sudit
Avoid to create many System.Timers.Timer / System.Threading.Timer, they are wrappers around Win32 waitable timer objects (look at http://msdn.microsoft.com/en-us/magazine/cc164015.aspx). Timer wheel is best in this situation. Also I would recommend to avoid span thread each time you need processing, instead use ThreadPool. Check http://msdn.microsoft.com/en-us/magazine/cc164139.aspx for details.
Nick Martyshchenko
@Sentinel: Yes, you definitely want to dispatch events to a thread pool, both to avoid the overhead of thread creation/teardown, and to allow simultaneous events.
Steven Sudit
@Ziff: As Sentinel suggests, OS timers are a bit heavy-weight. Not good for thousands.
Steven Sudit
@Steven Sudit The 16 ms resolution further lowers my confidence in using this "algorithm" for my project. The timer wheel sounds great for multiple timers that you can either queue sequentially (where multiple timers can be used easily), or in situations where the events do not have to fire several thousand times per second. I am actually a little sad that it looks like I cannot easily implement it, it looks interesting.
Jrud
@Jrud: That doesn't follow. You can use a timer wheel with a spinlock instead of using an OS timer object.
Steven Sudit
Nick Martyshchenko
@Sentinel: More good advice. Yes, the whole point of being able to schedule so many pieces is to break up the task into chunks that do their business and pass the flow to something else, as with async I/O. If you have many events and they last a long time, a thread pool would be swamped. As for .NET 4.0 Tasks, they're great, but they don't seem *directly* relevant here.
Steven Sudit
@Sentinel: If you break out your advice into its own answer, I'd be happy to upvote it.
Steven Sudit
I am not thrilled about the extra CPU cycles a spinlock would present at first, but it may be a solution to the long-term for this. As there are more items in the queue, there shouldn't be as many cycles wasted on the spin lock, they will all be used.
Jrud
@Steven: Thank you :) I think better you will make final edit of your post when we gather different opinions from comments. I started to describe timer wheel when notice your answer and decided do not post my own.
Nick Martyshchenko
@Jrud: If the next event is in 10 minutes, you would sleep, not spin. But you'd wake up a little early and make sure to be exactly on time.
Steven Sudit
@Sentinel: It's up to you. I'll certainly integrate the best answers into a single one, but that doesn't stop you from getting full credit by having your own answer.
Steven Sudit
Jrud
A: 

The tradeoff in your three options is between memory and CPU. More timers mean more timer nodes (memory), and aggregating these timers into fewer timers means more CPU, as you check for events that need servicing at run time. The CPU overhead in starting too many timers (and expiring them) is not too great an issue with a decent timer implementation.

SO, in my opinion, if you have a good timer implementation, choose to start as many timers as you need (be as granular as possible). But if any of these timers per object are mutually exclusive, consider reusing a timer node.

Ziffusion
Definitely a solid answer
Jrud
He has the Windows OS timer, not a "good implementation". :-)
Steven Sudit
A: 

This reminds me of the old airline ticketing systems, where you had queues. Ticketing requests were put in different queues depending on what kind of attention they needed.

So maybe you could have the queue of objects requiring frequent attention, and the queue of objects requiring infrequent attention. When necessary, you move them from one to the other.

You could have a timer for the frequent queue, and a timer for the infrequent queue. For the frequent queue, you could split it into multiple queues, one for each thread.

For crunching the frequent queue(s), you should not have more threads than you have cores. If you have two cores, what you want to do is get both of them cranking. Any more threads than that will not make things any faster. In fact, if processing the objects requires disk I/O or getting in line for some other shared hardware, it may not even help to get both cores running.

Mike Dunlavey
You can make this as complicated as you like, but I don't see much benefit in doing so.
Steven Sudit
@Steven: Complicated?
Mike Dunlavey
I'm referring to the suggestion that jobs could be split up into multiple queues. While this approach is certainly applicable in the real-world example of airline, I'm confident that the solution to the OP's problem is best served by a simple, provably correct solution with just one queue. Having a per-thread queue, for example, could easily lead to more affinity than we'd want, allowing jobs to be delayed even though another thread is available.
Steven Sudit
+4  A: 

EDIT: I remember interesting interview definetely worth to view: Arun Kishan: Inside Windows 7 - Farewell to the Windows Kernel Dispatcher Lock

As @Steven Sudit stated I warn again: use it only as demo on how timer wheel works and some tasks you have to care about while implement it. Not as reference implementation. In real world you have to write far more complex logic to take into account available resources, scheduling logic and etc.


Here good points stated by Steven Sudit (read post comments for details):

1) Choose right structure to keep your jobs list (it depends as usually):

  • SortedList<> (or SortedDictionary<>) good on memory consumption and indexing but have to implement synchronized access

  • ConcurrentQueue<> will help you avoid locking but you have to implement ordering. It also very memory efficient

  • LinkedList<> is good on insert and retrieve (anyway we need head only) but requires synchronized access (thru it easily implemented via lock-free) and not so memory efficient as it stores two references (prev/next). But it become an issue when you have millions of jobs so all of them take significant amount of memory.

But:

I totally agree with @Steven:

It doesn't matter: neither one of these is a good fit. The right answer would be to use a regular queue and maintain its order yourself, since we most often need to access it only from the head or tail.

Generally, I would recommend using the most feature-full collection from the library, but that doesn't apply here because this is system-level code. We'd need to roll our own, either from scratch or on top of a less feature-rich collection

2) To simplify processing logic of simultaneous jobs you can add delegate list (e.g. via ConcurrentQueue to make it lock-free) into original Job class so when you need another job at same time you just add another delegate to start.

@Steven:

If two tasks are actually scheduled for the same time (whether actually or effectively), this is a normal case that does not require complicating our data structure. In other words, we don't need to group up simultaneous jobs so that we have to traverse two different collections; we can just make them adjacent

3) Start/stoping dispatcher not so straightful as it can be and so can lead to errors. Instead you can wait on an event while using a timeout.

@Steven:

This way, it will either wake up when the next job is ready or when a new job is inserted before the head. In the latter case, it could need to run it now or set a different wait. If presented with, say, 100 jobs all scheduled for the same instant, the best thing we can do is queue them all up.

If we needed to provide prioritization, that's a job for a priority dispatch queue and multiple pools in a producer/consumer relationship, but it still doesn't justify a start/stop dispatcher. The dispatcher should always be on, running in a single loop that sometimes cedes the core

4) About using ticks:

@Steven:

Sticking to one type of tick is fine, but mixing and matching gets ugly, particularly since it's hardware-dependent. I'm sure that ticks would be slightly faster than milliseconds, because it stores the former and has to divide by a constant to get the latter. Whether this operation winds up being costly is another matter, but I'm fine with using ticks to avoid the risk.

My thoughts:

Another good point, I agree with you. But sometimes division by constant becomes costly and it not so fast as it maybe seems to be. But when we talk about 100 000 of DateTimes it does not matter, you are right, thank you to point.

5) "Managing resources":

@Steven:

The problem I'm trying to highlight is that the call to GetAvailableThreads is expensive and naive; the answer is obsolete before you can even use it. If we really wanted to keep track, we could get initial values and keep a running count by calling the job from a wrapper that uses Interlocked.Increment/Decrement. Even then, it presumes that the rest of the program is not using the thread pool. If we really wanted fine control, then the right answer here is to roll our own thread pool

I absolutely agree that calling to GetAvailableThreads is naive method to monitor available resources thru CorGetAvailableThreads not so expensive. I want to demontrate there are needs to manage resources and seems to choose bad example.

By any means provided in source code example is must not be treated as right way to monitor available resources. I just want to demonstrate you have to think about it. Thru maybe coded no so good piece of code as example.

6) Using Interlocked.CompareExchange:

@Steven:

No, it's not a common pattern. The most common pattern is to briefly lock. Less common is to flag the variable as volatile. Much less common would be to use VolatileRead or MemoryBarrier. Using Interlocked.CompareExchange this way is obscure, even if Richter does it. using it without an explanatory comment is absolutely guaranteed to confuse, as the word "Compare" implies that we're doing a comparison, when in fact we're not.

You are right I have to point about its usage.


using System;
using System.Threading;

// Job.cs

// WARNING! Your jobs (tasks) have to be ASYNCHRONOUS or at least really short-living
// else it will ruin whole design and ThreadPool usage due to potentially run out of available worker threads in heavy concurrency

// BTW, amount of worker threads != amount of jobs scheduled via ThreadPool
// job may waits for any IO (via async call to Begin/End) at some point 
// and so free its worker thread to another waiting runner

// If you can't achieve this requirements then just use usual Thread class
// but you will lose all ThreadPool's advantages and will get noticeable overhead

// Read http://msdn.microsoft.com/en-us/magazine/cc164139.aspx for some details

// I named class "Job" instead of "Task" to avoid confusion with .NET 4 Task 
public class Job
{
    public DateTime FireTime { get; private set; }

    public WaitCallback DoAction { get; private set; }
    public object Param { get; private set; }

    // Please use UTC datetimes to avoid different timezones problem
    // Also consider to _never_ use DateTime.Now in repeat tasks because it significantly slower 
    // than DateTime.UtcNow (due to using TimeZone and converting time according to it)

    // Here we always work with with UTC
    // It will save you a lot of time when your project will get jobs (tasks) posted from different timezones
    public static Job At(DateTime fireTime, WaitCallback doAction, object param = null)
    {
        return new Job {FireTime = fireTime.ToUniversalTime(), DoAction = doAction, Param = param};
    }

    public override string ToString()
    {
        return string.Format("{0}({1}) at {2}", DoAction != null ? DoAction.Method.Name : string.Empty, Param,
                             FireTime.ToLocalTime().ToString("o"));
    }
}

 using System;
 using System.Collections.Generic;
 using System.Diagnostics;
 using System.Linq;
 using System.Threading;

// Dispatcher.cs

// Take a look at System.Runtime IOThreadTimer.cs and IOThreadScheduler.cs
// in Microsoft Reference Source, its interesting reading

public class Dispatcher
{
    // You need sorted tasks by fire time. I use Ticks as a key to gain some speed improvements during checks
    // There are maybe more than one task in same time
    private readonly SortedList<long, List<Job>> _jobs;

    // Synchronization object to access _jobs (and _timer) and make it thread-safe
    // See comment in ScheduleJob about locking
    private readonly object _syncRoot;

    // Queue (RunJobs method) is running flag
    private int _queueRun;

    // Flag to prevent pollute ThreadPool with many times scheduled JobsRun
    private int _jobsRunQueuedInThreadPool;

    // I'll use Stopwatch to measure elapsed interval. It is wrapper around QueryPerformanceCounter
    // It does not consume any additional resources from OS to count

    // Used to check how many OS ticks (not DateTime.Ticks!) elapsed already
    private readonly Stopwatch _curTime;

    // Scheduler start time. It used to build time delta for job
    private readonly long _startTime;

    // System.Threading.Timer to schedule next active time
    // You have to implement syncronized access as it not thread-safe
    // http://msdn.microsoft.com/en-us/magazine/cc164015.aspx
    private readonly Timer _timer;

    // Minimum timer increment to schedule next call via timer instead ThreadPool
    // Read http://www.microsoft.com/whdc/system/pnppwr/powermgmt/Timer-Resolution.mspx
    // By default it around 15 ms
    // If you want to know it exactly use GetSystemTimeAdjustment via Interop ( http://msdn.microsoft.com/en-us/library/ms724394(VS.85).aspx )
    // You want TimeIncrement parameter from there
    private const long MinIncrement = 15 * TimeSpan.TicksPerMillisecond;

    // Maximum scheduled jobs allowed per queue run (specify your own suitable value!)
    // Scheduler will add to ThreadPool queue (and hence count them as processed) no more than this constant

    // This is balance between how quick job will be scheduled after it time elapsed in one side, and 
    // how long JobsList will be blocked and RunJobs owns its thread from ThreadPool
    private const int MaxJobsToSchedulePerCheck = 10;

    // Queue length
    public int Length
    {
        get
        {
            lock (_syncRoot)
            {
                return _jobs.Count;
            }
        }
    }

    public Dispatcher()
    {
        _syncRoot = new object();

        _timer = new Timer(RunJobs);

        _startTime = DateTime.UtcNow.Ticks;
        _curTime = Stopwatch.StartNew();

        _jobs = new SortedList<long, List<Job>>();
    }


    // Is dispatcher still working
    // Warning! Queue ends its work when no more jobs to schedule but started jobs can be still working
    public bool IsWorking()
    {
        return Interlocked.CompareExchange(ref _queueRun, 0, 0) == 1;
    }

    // Just handy method to get current jobs list
    public IEnumerable<Job> GetJobs()
    {
        lock (_syncRoot)
        {
            // We copy original values and return as read-only collection (thread-safety reasons)
            return _jobs.Values.SelectMany(list => list).ToList().AsReadOnly();
        }
    }

    // Add job to scheduler queue (schedule it)
    public void ScheduleJob(Job job)
    {
        // WARNING! This will introduce bottleneck if you have heavy concurrency. 
        // You have to implement lock-free solution to avoid botleneck but this is another complex topic.
        // Also you can avoid lock by using Jeffrey Richter's ReaderWriterGateLock (http://msdn.microsoft.com/en-us/magazine/cc163532.aspx)
        // But it can introduce significant delay under heavy load (due to nature of ThreadPool)
        // I recommend to implement or reuse suitable lock-free algorithm. 
        // It will be best solution in heavy concurrency (if you have to schedule large enough job count per second)
        // otherwise lock or maybe ReaderWriterLockSlim is cheap enough
        lock (_syncRoot)
        {
            // We'll shift start time to quick check when it pasts our _curTime
            var shiftedTime = job.FireTime.Ticks - _startTime;

            List<Job> jobs;
            if (!_jobs.TryGetValue(shiftedTime, out jobs))
            {
                jobs = new List<Job> {job};
                _jobs.Add(shiftedTime, jobs);
            }
            else jobs.Add(job);


            if (Interlocked.CompareExchange(ref _queueRun, 1, 0) == 0)
            {
                // Queue not run, schedule start
                Interlocked.CompareExchange(ref _jobsRunQueuedInThreadPool, 1, 0);
                ThreadPool.QueueUserWorkItem(RunJobs);
            }
            else 
            {
                // else queue already up and running but maybe we need to ajust start time
                // See detailed comment in RunJobs

                long firetime = _jobs.Keys[0];
                long delta = firetime - _curTime.Elapsed.Ticks;

                if (delta < MinIncrement)
                {
                    if (Interlocked.CompareExchange(ref _jobsRunQueuedInThreadPool, 1, 0) == 0)
                    {
                        _timer.Change(Timeout.Infinite, Timeout.Infinite);
                        ThreadPool.QueueUserWorkItem(RunJobs);
                    }
                }
                else 
                {
                    Console.WriteLine("DEBUG: Wake up time changed. Next event in {0}", TimeSpan.FromTicks(delta));
                    _timer.Change(delta/TimeSpan.TicksPerMillisecond, Timeout.Infinite);
                }
            }

        }
    }


    // Job runner
    private void RunJobs(object state)
    {
        // Warning! Here I block list until entire process done, 
        // maybe better will use ReadWriterLockSlim or somewhat (e.g. lock-free)
        // as usually "it depends..."

        // Here processing is really fast (a few operation only) so until you have to schedule many jobs per seconds it does not matter
        lock (_syncRoot)
        {
            // We ready to rerun RunJobs if needed
            Interlocked.CompareExchange(ref _jobsRunQueuedInThreadPool, 0, 1);

            int availWorkerThreads;
            int availCompletionPortThreads;

            // Current thread stats
            ThreadPool.GetAvailableThreads(out availWorkerThreads, out availCompletionPortThreads);

            // You can check max thread limits by
            // ThreadPool.GetMaxThreads(out maxWorkerThreads, out maxCompletionPortThreads);

            int jobsAdded = 0;

            while (jobsAdded < MaxJobsToSchedulePerCheck && availWorkerThreads > MaxJobsToSchedulePerCheck + 1 && _jobs.Count > 0)
            {
                // SortedList<> implemented as two arrays for keys and values so indexing on key/value will be fast
                // First element
                List<Job> curJobs = _jobs.Values[0];
                long firetime = _jobs.Keys[0];

                // WARNING! Stopwatch ticks are different from DateTime.Ticks
                // so we use _curTime.Elapsed.Ticks instead of _curTime.ElapsedTicks

                // Each tick in the DateTime.Ticks value represents one 100-nanosecond interval. 
                // Each tick in the ElapsedTicks value represents the time interval equal to 1 second divided by the Frequency.
                if (_curTime.Elapsed.Ticks <= firetime) break;

                while (curJobs.Count > 0 &&  jobsAdded < MaxJobsToSchedulePerCheck && availWorkerThreads > MaxJobsToSchedulePerCheck + 1)
                {
                    var job = curJobs[0];

                    // Time elapsed and we ready to start job
                    if (job.DoAction != null)
                    {
                        // Schedule new run

                        // I strongly recommend to look at new .NET 4 Task class because it give superior solution for managing Tasks
                        // e.g. cancel run, exception handling, continuation, etc
                        ThreadPool.QueueUserWorkItem(job.DoAction, job);
                        ++jobsAdded;

                        // It may seems that we can just decrease availWorkerThreads by 1 
                        // but don't forget about started jobs they can also consume ThreadPool's threads
                        ThreadPool.GetAvailableThreads(out availWorkerThreads, out availCompletionPortThreads);
                    }

                    // Remove job from list of simultaneous jobs
                    curJobs.Remove(job);
                }

                // Remove whole list if its empty
                if (curJobs.Count < 1) _jobs.RemoveAt(0);
            }

            if (_jobs.Count > 0)
            {
                long firetime = _jobs.Keys[0];

                // Time to next event
                long delta = firetime - _curTime.Elapsed.Ticks; 

                if (delta < MinIncrement) 
                {
                    // Schedule next queue check via ThreadPool (immediately)
                    // It may seems we start to consume all resouces when we run out of available threads (due to "infinite" reschdule)
                    // because we pass thru our while loop and just reschedule RunJobs
                    // but this is not right because before RunJobs will be started again
                    // all other thread will advance a bit and maybe even complete its task
                    // so it safe just reschedule RunJobs and hence wait when we get some resources
                    if (Interlocked.CompareExchange(ref _jobsRunQueuedInThreadPool, 1, 0) == 0)
                    {
                        _timer.Change(Timeout.Infinite, Timeout.Infinite);
                        ThreadPool.QueueUserWorkItem(RunJobs);
                    }
                }
                else // Schedule next check via timer callback
                {
                    Console.WriteLine("DEBUG: Next event in {0}", TimeSpan.FromTicks(delta)); // just some debug output
                    _timer.Change(delta / TimeSpan.TicksPerMillisecond, Timeout.Infinite);
                }
            }
            else // Shutdown the queue, no more jobs
            {
                Console.WriteLine("DEBUG: Queue ends");
                Interlocked.CompareExchange(ref _queueRun, 0, 1); 
            }
        }
    }
}

Quick example of usage:

    // Test job worker
    static void SomeJob(object param)
    {
        var job = param as Job;
        if (job == null) return;

        Console.WriteLine("Job started: {0}, [scheduled to: {1}, param: {2}]", DateTime.Now.ToString("o"),
                          job.FireTime.ToLocalTime().ToString("o"), job.Param);
    }

    static void Main(string[] args)
    {
        var curTime = DateTime.UtcNow;
        Console.WriteLine("Current time: {0}", curTime.ToLocalTime().ToString("o"));
        Console.WriteLine();

        var dispatcher = new Dispatcher();

        // Schedule +10 seconds to future
        dispatcher.ScheduleJob(Job.At(curTime + TimeSpan.FromSeconds(10), SomeJob, "+10 sec:1"));
        dispatcher.ScheduleJob(Job.At(curTime + TimeSpan.FromSeconds(10), SomeJob, "+10 sec:2"));

        // Starts almost immediately
        dispatcher.ScheduleJob(Job.At(curTime - TimeSpan.FromMinutes(1), SomeJob, "past"));

        // And last job to test
        dispatcher.ScheduleJob(Job.At(curTime + TimeSpan.FromSeconds(25), SomeJob, "+25 sec"));

        Console.WriteLine("Queue length: {0}, {1}", dispatcher.Length, dispatcher.IsWorking()? "working": "done");
        Console.WriteLine();

        foreach (var job in dispatcher.GetJobs()) Console.WriteLine(job);
        Console.WriteLine();

        Console.ReadLine();

        Console.WriteLine(dispatcher.IsWorking()?"Dispatcher still working": "No more jobs in queue");

        Console.WriteLine();
        foreach (var job in dispatcher.GetJobs()) Console.WriteLine(job);

        Console.ReadLine();
    }

Hope it will be helpful.


@Steven Sudit point me some issues, so here I try to give my vision.

1) I wouldn't recommend using SortedList here, or anywhere else, as it's an obsolete .NET 1.1 class

SortedList<> by any means is not obsolete. It still exists in .NET 4.0 and introduced in .NET 2.0 when generics was introduced into language. I can't see any point to remove it from .NET.

But real question here I trying to answer: What data structure can store values in sorted order and will be efficient in storing and indexing. There are two suitable ready to use data structures: SortedDictionary<> and SortedList<>. Here some info about how to choose. I just don't want waste implementation with my own code and hide main algorithm. Here I can implement priority-array or something other but it takes more lines to code. I don't see any reason do not use SortedList<> here...

BTW, I can't understand why you not recommend it? What are reasons?

2) In general, there's no need to complicate the code with special cases for simultaneous events.

When @Jrud says he probably will have numerous task to schedule I think it they may have heavy concurrency, so I demonstrate how to solve it. But my point: even if you have low concurrency you stil have chance to get events in same time. Also this is easy possible in multithreaded evironment or when there are many sources want to schedule jobs.

Interlocked functions not so complicated, cheap and since .NET 4.0 inlined so there are no problem to add guard in such situation.

3) The IsWorking method should just use a memory barrier and then read the value directly.

Im not so sure here that you are right. I would recommend to read two nice articles: Part 4: Advanced Threading of Threading in C# by Joseph Albahari and How Do Locks Lock? by Jeff Moser. And of cause Chapter 28 (Primitive Thread Synchronization Constructs) of CLR via C# (3rd edition) by Jeffrey Richter.

Here some qoute:

The MemoryBarrier method doesn’t access memory but it forces any earlier programorder loads and stores to be completed before the call to MemoryBarrier. And it also forces any later program-order loads and stores to be completed after the call to MemoryBarrier. MemoryBarrier is much less useful than the other two methods

Important I know that this can be very confusing, so let me summarize it as a simple rule: When threads are communicating with each other via shared memory, write the last value by calling VolatileWrite and read the first value by calling VolatileRead.

I would also recommend: Intel® 64 and IA-32 Architectures Software Developer's Manuals if you care about it seriously.

So I don't use VolatileRead/VolatileWrite in my code neither volatile keyword, I don't think Thread.MemoryBarrier will be better here. Maybe you can point me what I miss? Some articles or in-depth discussion?

4) The GetJobs method looks like it could lock for an extended period. Is it necessary?

First of all its just handy method, sometime it is necessary to get all tasks in queue at least for debugging.

But you are not right. As I mentioned in code comments SortedList<> implemented as two arrays you can check this by Reference Source or just by viewing in Reflector. Here some comments from reference source:

// A sorted list internally maintains two arrays that store the keys and
// values of the entries.  

I got from .NET 4.0 but it not changed much since 2-3.5

So my code:

_jobs.Values.SelectMany(list => list).ToList().AsReadOnly();

involve following:

  • iterate thru values in array of references to List. Indexing array is very fast.
  • iterate thru each List (which is implemented internally as array too). It very fast too.
  • build new List of references (via ToList()) which is very fast too (just dynamic array) (.NET has very solid and fast implementation)
  • build read-only wrapper (no copy, just iterator wrapper)

so consequently we have just flatten read-only list of references to Job's objects. It very fast even you have millions of task. Try to measure yourself.

Any way I added it to show what happens during execution cycle (for debug purposes) but I think it can be useful.

5) A lock-free queue is available in .NET 4.0.

I would recommend to read Patterns of parallel programming by Stephen Toub and Thread-safe Collections in .NET Framework 4 and Their Performance Characteristics, also here many interesting articles.

So I quote:

ConcurrentQueue(T) is a data structure in .NET Framework 4 that provides thread-safe access to FIFO (First-In First-Out) ordered elements. Under the hood, ConcurrentQueue(T) is implemented using a list of small arrays and lock-free operations on the head and tail arrays, hence it is quite different than Queue(T) which is backed by an array and relies on the external use of monitors to provide synchronization. ConcurrentQueue(T) is certainly more safe and convenient than manual locking of a Queue(T) but some experimentation is required to determine the relative performance of the two schemes. In the remainder of this section, we will refer to a manually locked Queue(T) as a self-contained type called SynchronizedQueue(T).

It don't have any methods to maintain ordered queue. Neither any of new thread-safe collection, they all maintain unordered collection. But reading original @Jrud description I think we have to maintain ordered list of time when task need to be fired. Am I wrong?

6) I wouldn't bother starting and stopping the dispatcher; just let it sleep until the next job

Do you know good way to make sleep ThreadPool's thread? How you will implement it?

I think dispatcher goes "sleep" when he does not process any task and schedule job wake-up it. Anyway there are no special processing to put it sleep or wake up so in my thoughts this process equals "sleep".

If you told that I should just reschedule RunJobs via ThreadPool when no jobs available when you are wrong it will eat too many resources and can impact started jobs. Try yourself. Why to do unnecessary job when we can easily avoid it.

7) Rather than worrying about different kinds of ticks, you could just stick to milliseconds.

You are not right. Either you stick to ticks or you don't care about it entirely. Check DateTime implementation, each access to milliseconds property involve converting internal representaion (in ticks) to ms including division. This can hurt performance on old (Pentium class) compulters (I measure it myself and you can too).

In general I will agree with you. We don't care about representation here because it does not give us noticeable performance boost.

It just my habbit. I process billions of DateTime in recent project so coded accordingly to it. In my project there are noticeable difference between processing by ticks and by other components of DateTime.

8) The attempt to keep track of available threads does not seem likely to be effective

I just want to demonstrate you have to care about it. In real world you have to implement far from my straightful logic of scheduling and monitoring resources.

I want to demonstrate timer wheel algorithm and point to some problem that author have to think when implement it.

You are absolutely right I have to warn about it. I thought "quickly ptototype" would be enough. My solution in any means can't be used in production.

Nick Martyshchenko
Wow, you REALLY care about my question lol +1, and I do understand C# code... I learned it a few semesters after VB in college. It uses same API as VB so every things pretty much same, just different syntax. +1 for taking serious interest.
Jrud
There are some good ideas in here, but also some issues, including many over-complications. 1) I wouldn't recommend using SortedList here, or anywhere else, as it's an obsolete .NET 1.1 class. 2) In general, there's no need to complicate the code with special cases for simultaneous events. 3) The IsWorking method should just use a memory barrier and then read the value directly. 4) The GetJobs method looks like it could lock for an extended period. Is it necessary? 5) A lock-free queue is available in .NET 4.0.
Steven Sudit
5) I wouldn't bother starting and stopping the dispatcher; just let it sleep until the next job. 6) Rather than worrying about different kinds of ticks, you could just stick to milliseconds. 7) The attempt to keep track of available threads does not seem likely to be effective.
Steven Sudit
@Steven Sudit, thank you for comments. I try to answer in my post rather than here.
Nick Martyshchenko
Let me say in advance that I'm very familiar with the resources you reference, so I'm going to try to keep the scope narrow in my replies. As I said before, there are many good things about your solution, but I think we agree that there is much that would need to be changed before we're ready for production. My comments are suggestions for how that could be done. Ultimately, this would need to be benchmarked with real-world data, but first we need something that is simple and correct. Otherwise, when we tweak it for speed, we are likely to either uncover latent breaks or insert new ones.
Steven Sudit
1) You're right that the type of `SortedList` you used is 2.0, not 1.1; I was mistaken. The reason I would not recommend a non-generic collection for this sort of work is because the need to box and unbox introduces complexity and impacts performance. From what I understand, `SortedDictionary<>` would be better for this task than `SortedList`, but neither one is quite right. In particular, we want a list that doesn't need reallocation (linked list, not array), that is kept sorted by key but retrieved only sequentially, and that allows multiple values for each key.
Steven Sudit
Generally, I would recommend using the most feature-full collection from the library, but that doesn't apply here because this is system-level code. We'd need to roll our own, either from scratch or on top of a less feature-rich collection.
Steven Sudit
2) In practice, you are unlikely to get simultaneous events unless they're being scheduled on round numbers (such as to the nearest second). That's why I'm suggesting a queue that does not try to combine multiple events, instead leaving them adjacent. This simplifies the dispatch logic, as it removes the special case.3) As written, the IsWorking method calls `Interlocked.Exchange` without exchanging anything. The only thing you could possibly get out of this is volatility, which is better provided by any of the methods you mention. This is as much a matter of code readability as performance.
Steven Sudit
4) Let me put this more concretely: no matter how you slice it, it's O(n). While I would recommend keeping the queue small by inserting only one instance of a recurring event at a time, it's still possible for it to grow to non-trivial size, which means the copy operation can be expensive. If we reuse job instances when they recur, then the copy would need to be deep, adding to the cost. Ultimately, there does not appear to be any actual use case for this method, but calling it could cause jobs to be delayed. On this basis, I would recommend removing it.
Steven Sudit
5) You are correct that `ConcurrentQueue` does not provide sorting, but we don't actually need that. The most common operation is removing the head of the queue, which can be done without locks. Probably the next most common operation is adding a job to the tail. There remains the case where we need to iterate through the queue to find an insertion point, but even this can be done without locks; it's cheaper to retry on losing a race than to lock. It would probably help to check whether the new job was closer to the head or tail, to decide which direction to search from.
Steven Sudit
6) I make this suggestion because the logic to start/stop the dispatcher is very complicated and therefore prone to error. To make the dispatcher sleep, I would recommend waiting on an event while using a timeout. This way, it will either wake up when the next job is ready or when a new job is inserted before the head. In the latter case, it could need to run it now or set a different wait. If presented with, say, 100 jobs all scheduled for the same instant, the best thing we can do is queue them all up.
Steven Sudit
If we needed to provide prioritization, that's a job for a priority dispatch queue and multiple pools in a producer/consumer relationship, but it still doesn't justify a start/stop dispatcher. The dispatcher should always be on, running in a single loop that sometimes cedes the core.
Steven Sudit
7) Sticking to one type of tick is fine, but mixing and matching gets ugly, particularly since it's hardware-dependent. I'm sure that ticks would be slightly faster than milliseconds, because it stores the former and has to divide by a constant to get the latter. Whether this operation winds up being costly is another matter, but I'm fine with using ticks to avoid the risk.
Steven Sudit
8) The problem I'm trying to highlight is that the call to `GetAvailableThreads` is expensive and naive; the answer is obsolete before you can even use it. If we really wanted to keep track, we could get initial values and keep a running count by calling the job from a wrapper that uses `Interlocked.Increment/Decrement`. Even then, it presumes that the rest of the program is not using the thread pool. If we really wanted fine control, then the right answer here is to roll our own thread pool.
Steven Sudit
@Steven Sudit lets continue in comments :)1) I declare my variable as private readonly SortedList<long, List<Job>> _jobs; so its generic SortedList variant why you talk about its non-generic counterpart? SortedList efficient in memory consumtion and iteration, SortedDictionary better when we need faster insert. I measure them myself and use in different project and found SortedList<> often effiently than SortedDictionary<> but as usually "it depends".
Nick Martyshchenko
@2) I misundertand. You are right if we really can't get simultaneous event. But what if we can add similar tasks at same time as I do in demo and don't want to combine them into one?
Nick Martyshchenko
@3 This is common pattern to get value in thread-safe way by using Interlocked methods, check Jeffrey Richter for example. It rely on fact Interlocked.CompareExchange (not Interlocked.Exchange as you mention) not only change value but also always return old value
Nick Martyshchenko
@4 You are right of cause. I take as start @Jrud original class. Logic you are suggest have to be implemented but I think its far from aim of my original code.
Nick Martyshchenko
@5 I don't understand how it helps. You need to maintain order, so you have to find right position to insert element (SortedList make it via ordinary BinarySearch). Also "ConcurrentQueue(T) is implemented using a list of small arrays and lock-free operations on the head and tail arrays, hence it is quite different than Queue(T) which is backed by an array and relies on the external use of monitors to provide synchronization". Take a look at TryDequeue implementation and Thread-safe Collections in .NET Framework 4 and Their Performance Characteristics paper it not always best choice.
Nick Martyshchenko
@5 But it helps to avoid lock when get head you are right of cause but cost needs to be verify.
Nick Martyshchenko
@6 it good point, thank you
Nick Martyshchenko
@7 Another good point, I agree with you. But sometimes division by constant becomes costly and it not so fast as it maybe seems to be. But when we talk about 100 000 of DateTimes it does not matter, you are right, thank you to point.
Nick Martyshchenko
@8 I absolutely agree that calling to GetAvailableThreads is naive method to monitor available resources thru CorGetAvailableThreads not so expensive. I want to demontrate there are needs to manage resources and seems to choose bad example
Nick Martyshchenko
@Steven Sudit thank you very much for commenting and point me to issues
Nick Martyshchenko
I looks like we're closer to being on the same page, so I'm only going to reply on the items where there seems to be more that needs to be said. 1) It doesn't matter: neither one of these is a good fit. 2) If two tasks are actually scheduled for the same time (whether actually or effectively), this is a normal case that does not require complicating our data structure. In other words, we don't need to group up simultaneous jobs so that we have to traverse two different collections; we can just make them adjacent.
Steven Sudit
3) No, it's not a common pattern. The most common pattern is to briefly lock. Less common is to flag the variable as volatile. Much less common would be to use `VolatileRead` or `MemoryBarrier`. Using `Interlocked.CompareExchange` this way is *obscure*, even if Richter does it. using it without an explanatory comment is absolutely guaranteed to confuse, as the word "Compare" implies that we're doing a comparison, when in fact we're not.
Steven Sudit
4) It's true that sometimes a quick lock is cheaper than complicated non-locking mechanisms. Even if this were the case here, which I'm not convinced, the right answer would be to use a regular queue and maintain its order yourself, since we most often need to access it only from the head or tail. Contrary to what you said, SortedList is O(n) for insertions, which means it's not using a binary search.
Steven Sudit
Yes, this is an interesting problem, and it's led to an interesting discussion.
Steven Sudit
@Steven (4): I agree, I usually implemented in similar situation data structure you describes but I think it go beyond the scope example. But SortedList<> use BinarySearch. Check youself in source or via Reflector in Add it useint i = Array.BinarySearch<TKey>(keys, 0, _size, key, comparer); to determine position when it needs to insert to.
Nick Martyshchenko
I recheck: http://msdn.microsoft.com/en-us/library/ms132330.aspxIt is an O(log n) operation if the new element is added at the end of the list. If insertion causes a resize, the operation is O(n).So you are not right
Nick Martyshchenko
It says "This method is an O(n) operation for unsorted data, where n is Count. It is an O(log n) operation if the new element is added at the end of the list. If insertion causes a resize, the operation is O(n)." Now, we can ignore the last case, because any realloc of a fixed-sized block has to be O(n). Normally, a linked list has a tail pointer, so inserting at the end costs O(1), while inserting in sorted order is O(n), since it has to traverse the list. (continued)
Steven Sudit
The way I figure, since it's using an array instead of a linked list, it can find the point in O(log n) by using a binary search, but still has to pay O(n) to move the current elements over to make room for the new one. As such, insertion becomes O(n) because the slower part dominates. This explains why inserting to the end is so slow: it's a binary search to the last element, but not followed by a need to make room. So you're right that it's using a binary search, but would be wrong to suggest that its performance is akin to a linked binary tree (which is to say, O(log n) for most ops).
Steven Sudit
Frankly, this is not a great set of characteristics, unless the constants involved are tiny in comparison to the linked-list alternative.
Steven Sudit
@Steven, we started to discuss SortedList<>, it always do BinarySearch before insert. Check Reflector or refence source. But you switched in discussion (when?) to LinkedList<> of cause its behaviour is different. Just look how Microsoft team implemented it and your questions will get answers
Nick Martyshchenko
@Steven and you are right of cause, linked list structures outperformance array based on insert/remove since they dont have to resize (or at least move) source array. But as usually here we have to choice we want to be fast on insert (SortedDictionary<> implemented as binary search tree) or we want to be memory efficient.
Nick Martyshchenko
I switched when I said that SortedList and SortedDictionary are both inappropriate, so we'd do better with LinkedList or ConcurrentQueue. I'm not convinced that an array-based collection would be significantly more memory-efficient, either in principle or practice. First, GC makes allocation cheap. Second, if you implement your own single-linked list, it actually takes up less memory. And, finally, even if there were such a trade-off, we can afford to burn some bytes but not to be slow. Having said that, .NET is definitely not a suitable environment for critical timing.
Steven Sudit
I see. But you are not right assuming array based solutions are not memory efficient. Each class reference costs 24 byte overhead in 32-bit. So when you have 100 000 000 class refence, you get 2.4 GB overhead to keep them in memory. Array of structures does'nt have such overhead. But in my code since Job is class there are almost no difference (you are right) since they both contains class reference. I feel we go wrong direction start discussing abstract optimization ideas without real technical requirements on system behaviour. We both right. "It depends..." rule :)
Nick Martyshchenko
Yes, it always depends on the details. The biggest detail is that .NET is not at all ideal for this task. GC can kick in at any time, timer precision is low and there's no guarantee whatsoever that the timer will be on time. Doing this in C++ would help, but the deeper issue is Windows itself, as it's just not designed for this sort of thing.
Steven Sudit
Thru .NET head to right direction :) Windows itself was'nt hard-realtime OS in any time. GC becomes more controllable in .NET it starts to announce about full GC and there are some options to almost avoid full collection if app designed for that, there are option to create own CLR Host but all this things far from our original topic. @Steven thank you for great comments and discussion, I read and learn from them with great pleasure.
Nick Martyshchenko
And I'll take that as a polite hint to shut up already. :-)
Steven Sudit
If you want we can continue :) I will commment your comments as usually before that :) I just thought we take away our readers from right direction and they lost in details :)
Nick Martyshchenko