views:

1327

answers:

9

I am designing a dedicated syslog-processing daemon for Linux that needs to be robust and scalable and I'm debating multithread vs. multiprocess.

The obvious objection with multithreading is complexity and nasty bugs. Multi-processes may impact performance because of IPC communications and context switching.

"The Art of Unix Programming" discusses this here.

Would you recommend a process-based system (like Apache) or a multi-threaded approach?

+2  A: 

Depends on what programming language you want to use (and which libraries). Personally I would choose multithreading, as I know the problems associated with threads (and how to solve them).

Multiprocessing might help you if you want to run the daemon on multiple machines and distribute the load amongst them, but I don't think that that's a major problem here.

tstenner
A: 

Do you need to share updating data between the instances where the updates are frequent and IPC would be too expensive? In that case multithreading is probably better. Otherwise you have to weigh whether the robustness of separate processes or the ease of thread creation/communication is more important to you.

Mark Probst
A: 

Windows platform is going to NUMA suport, see more info here :

New NUMA Support with Windows Server 2008 R2 and Windows 7

Learning Experience of NUMA and Intel's Next Generation Xeon Processor I

lsalamon
+2  A: 

You've left out too many details. Actually, in terms of what you have already stated, the choice is irrelevant and there is nothing inherently more buggy about multithreading than multiprocessing; you're missing why these techniques have such a reputation. If you aren't sharing data then there isn't much problem to be had (of course, there may be some other issues, but we need details to decide about those). Also, it matters what platform, on UNIX like operating systems, processes are pretty lightweight anyway.

However, there are other issues to consider? What kind of system(s) will you be running on? You definitely don't want to spawn out several processes on a uniprocessor system as you aren't going to get much benefit, depending on some other details you could specify. If you describe the nature of the problem you are trying to solve, we can help further.

BobbyShaftoe
+1  A: 

If you want robustness, use multi-processing.

The processes will share the logging load between them. Sooner or later, a logging request will hit a bug and crash the logger. With multi-processing, you only lose one process and so only that one logging request (which you couldn't have handled anyway, because of the bug).

Multi-threading is vulnerable to crashes, since one fatal bug takes out your single process.

Mulit-processing is in some ways more technically challenging, since you have to balance workload over processes, which may entail using shared memory.

Blank Xavier
A: 

One question is whether it's necessary to do either. I don't know the details of your requirements, but a single threaded app using select(2) may fit your needs and not have the disadvantages of either processes or threads. This requires that you be able to centralize all of your I/O into one central place, most likely dispatching to other modules via callbacks, but that isn't all that hard unless you have a lot of libraries that want to do their own I/O and can't be restructured in this way.

Brian Campbell
+1  A: 

Both of them can be complicated and complex in their own ways.

You can do either. In the grand scheme of things, it might not matter which you choose. What does matter is how well you do them. Therefore:

Do what you are most experienced with. Or if your leading a team, do what the team is most experienced with.

---Threading!---

I have done a lot of threaded programming, and I enjoy parts of it, and parts of it I do not enjoy. I've learned a lot, and now can usually write a multi-threaded application without too much pain, but it does have to be written in a very specific way. Namely:

1) It has to be written with very clearly defined data boundaries that are 100% thread safe. Otherwise, whatever condition that can happen, will happen, and it might not be when you have a debugger laying around.. Plus debugging threaded code is like peering into Schrodinger's box... By looking in there, other threads may or may not have had time to process more.

2) It has to be written with test code that stresses the machine. Many multi-threaded systems only show their bugs when the machines are heavily stressed.

3) There has to be some very smart person who owns the data exchanging code. If there is any way for a shortcut to be made, some developer will probably make it, and you will have an errant bug.

4) There has to be catch-all situations that will reset the application with a minimum of fuss. This is for the production code that breaks because of some threading issue. In short: The show must go on.

---Cross-Process!---

I have less experience with process-based threading, but have recently been doing some cross-process stuff in Windows (where the IPC is web service calls... WOO!), and it is relatively clean and simple, but I follow some rules here as well. By and large, interprocess communication will be much more error free because programs receive input from the outside world very well.. and those transport mechanisms are usually asynchronous. Anyway...

1) Define clear process boundaries and communication mechanisms. Message/eventing via, oh say, TCP or web services or pipes or whatever is fine, as long as the borders are clear, and there is a lot of validation and error checking code at those borders.

2) Be prepared for bottlenecks. Code forgiveness is very important. By this I mean, sometimes you won't be able to write to that pipe. You have to be able to requeue and retry those messages without the application locking up/tossing an exception.

3) There will be a lot more code in general, because transporting data across process boundaries means you have to serialize it in some fashion. This can be a source of problems, especially when you start maintaining and changing that code.

Hope this helps.

Lomilar
A: 

Thanks everyone for your feedback.

I have decided on a multi-process architecture, similar to the Apache web server. The processes will scale nicely on multi-processor/core systems. Communications will be performed with pipes or sockets.

Processes will be ready to use in a process-pool so there's no process spawning cost.

The performance hit will be negligible in comparison to the robustness I'll gain.

pinto
A: 

Well, we finally implemented it as a multi-processed system with pipes for IPC and a bookkeeper that spawns processes as needed. Similar to the Apache httpd. It works perfectly.

pinto