views:

87

answers:

1

I need help designing a message distribution system. So far I have 2 processes, one listening for remote client's to deliver messages, which are then written to a database table.

I then have a second process that reads from this same table every [n] seconds,getting up to 100 messages in a single read, and if any new records, queues each one to be sent on its own ThreadPool issued backgorund thread.

If more messages than threads are available, The ThreadPool will queue up all those beyond its max thread count. If no messages, goes back to sleep and waits for next Timer event to wake it up for another check of the db table.

The problem is I could have a Lot of messages arrive during one interval: it would be far better to leave them in the Db till needed rather than in memory, queued up in the ThreadPool, waiting.

In other words, I'm looking for an elegant way of knowing when to know its ok to add more queued threads, rather than simply wait till next timer interval...

One idea I had was to count how many Worker threads I had queued (eg: 500, equal to max threads I set up first), and count them as they complete. If they fall below 1/2 (eg: 250), rethrow a Db check. If records are found, great, get a 100 at a time, until db table is totally read, or 500 max is reached again.

In other words, making the main focus of dequeuing be the threads themselves, self launching themselves continuinuity, rather than the timer (the timer interval only because a mechanism to re-kickstart the process in case the pipe dries up).

Does anybody has advice/comments/experience with such a system? Is the approach solid? Or seriously flawed?

+1  A: 

I've found that the overheads involved in threading such as context switching can quickly make using threads detrimental to performance. Also, unless your threads are spending a lot of time waiting on IO etc, there's no real point in having more threads than you have cpus (or cores).

So assuming that you actually need threads to process your data, perhaps you can create a handful of threads. Each thread queries the database to grab a chunk of data (perhaps limited to 100 rows at a time) and processes it. When it finishes processing it tries to get another chunk of data. You will need to synchronise the data access (ie synchronise the last row ID retrieved) and will still need a timer in case the threads process all available data and sleep. This approach assumes that the data processing takes significantly longer than database access.

Most importantly, are you sure you actually need threads at all? I'd say you're best bet is to just get it working without threads, then optimise later if necessary. That is the most important lesson I've learnt about threads (the hard way).

AndrewS
Hi Andrew:The delivery services used are async webservices, so yes, the time to complete them are significantly slower than DB access. And therefore (I think) lend themselves well to creating more threads than CPU's.With ThreadPool, context switching is relatively painless. But more importantly, seach each Send request can fail due to timeouts, best to allocate a thread to each? Several timeouts in one thread could cause a real build up of messages...
@Ciel: you said "With ThreadPool, context switching is relatively painless". I don't think that is a true statement. One thread per CPU core is ideal.
ceretullis