views:

325

answers:

12

While reading up on SQLite, I stumbled upon this quote in the FAQ: "Threads are evil. Avoid them."

I have a lot of respect for SQLite, so I couldn't just disregard this. I got thinking what else I could, according to the "avoid them" policy, use instead in order to parallelize my tasks. As an example, the application I'm currently working on requires a user interface that is always responsive, and needs to poll several websites from time to time (a process which takes at least 30 seconds for each website).

So I opened up the PDF linked from that FAQ, and essentially it seems that the paper suggests several techniques to be applied together with threads, such as barriers or transactional memory - rather than any techniques to replace threads altogether.

Given that these techniques do not fully dispense with threads (unless I misunderstood what the paper is saying), I can see two options: either the SQLite FAQ does not literally mean what it says, or there exist practical approaches that actually avoid the use of threads altogether. Are there any?


Just a quick note on tasklets/cooperative scheduling as an alternative - this looks great in small examples, but I wonder whether a large-ish UI-heavy application can be practically parallelized in a solely cooperative way. If you have done this successfully or know of such examples this certainly qualifies as a valid answer!

+5  A: 

The statement in the SQLite FAQ, as I read it, is just a comment on how difficult threading can be to the uninitiated. It is the author's opinion, and it might be a valid one. But saying you should never use threads is throwing the baby out with the bath water, in my opinion. Threads are a tool. Like all tools, they can be used and they can be abused. I can read his paper and be convinced that threads are the devil, but I have used them successfully, without killing kittens.

Keep in mind that SQLite is written to be as lightweight and easy to understand (from a coding standpoint) as possible, so I would imagine that threading is kind of the antithesis to this lightweight approach.

Also, SQLite is not meant to be used in a highly-concurrent environment. If you have one of these, you might be better off working with a more enterprisey database like Postgres.

Robert Harvey
-1. I don't think this is a reasonable interpretation of the statement "Threads are evil", especially when it's a link to a paper which argues at great length that threads are a bad idea. (From the abstract, "They discard the most essential and appealing properties of sequential computation: understandability, predictability, and determinism.")
Jason Orendorff
Un-downvoted! I have to say I still disagree with you though. Shared-everything threads are like C pointers: useful and dangerous. Most people who use them claim they can be used safely, but my experience strongly suggests taking such claims with a grain of salt. That paper is worth a read.
Jason Orendorff
+1  A: 

Evil, but a necessary evil. High level abstractions of threads (Tasks in .NET for example) are becoming more common but for the most part the industry is not trying to find a way to avoid threads, just making it easier to deal with the complexities that come with any kind of concurrent programming.

Josh Einstein
+2  A: 

If you really want to live without threads, you can, so long as you don't call any functions that can potentially block. This may not be possible.

One alternative is to implement the tasks you would have made into threads as finite state machines. Basically, the task does what it can do immediately, then goes to its next state, waiting for an event, such as input arriving on a file or a timer going off. X Windows, as well as most GUI toolkits, support this style. When something happens, they call a callback, which does what it needs to do and returns. For a FSM, the callback checks to see what state the task is in and what the event is to determine what to do immediately and what the next state will be.

Say you have an app that needs to accept socket connections, and for each connection, parse command lines, execute some code, and return the results. A task would then be what listens to a socket. When select() (or Gtk+, or whatever) tells you the socket has something to read, you read it into a buffer, then check to see if you have enough input buffered to do something. If so, you advance to a "start doing something" state, otherwise you stay in the "reading a line" state. (What you "do" could be multiple states.) When done, your task drops the line from the buffer and goes back to the "reading a line" state. No threads or preemption needed.

This lets you act multithreaded by way of being event-driven. If your state machines are complicated, however, your code can get hard to maintain pretty fast, and you'll need to work up some kind of FSM-management library to separate the grunt work of running the FSM from the code that actually does things.

P.S. Another way to get threads without really using threads is the GNU Pth library. It doesn't do preemption, but it is another option if you really don't want to deal with threads.

Mike DeSimone
How is using Pth not using threads? They may not be OS threads, but you're still sharing the same address space which is the point of avoiding threads. +1 for the FSMs, though.
Jacob
There are several reasons why someone would say, "Don't use threads." Sometimes, Pth is an acceptable alternative, and sometimes not.
Mike DeSimone
+2  A: 

One trend I've noticed, at least in the Cocoa domain, is help from the framework. Apple has gone to great lengths to help developers with the relatively difficult concept of concurrent programming. Some things I've seen:

  • Different granularity of threading. Cocoa supports everything from posix threads (low level) to object oriented threading with NSLock and NSThread, to high level parellelism such as NSOperation. Depending on your task, using a high level tool like NSOperation is easier and gets the job done.

  • Threading behind the scenes via an API. Lots of the UI and animation stuff in cocoa is hidden behind an API. You are responsible for calling an API method and providing an asynchronous callback this executed when the secondary thread completes (for example the end of some animation).

  • openMP. There are tools like openMP that allow you to provide pragmas that describe to the compiler that some task may be safely parelellized. For example iterating a set of items in an independent way.

It seems like a big push in this industry is to make things simple for the Application developers and leave the gory thread details to the system developers and framework developers. There is a push in academia for formalizing parellel patterns. As mentioned you cant always avoid threading, but there are an increasing number of tools in your arsenal to make it as painless as possible.

darren
"It seems like a big push in this industry is to make things simple for the Application developers and leave the gory thread details to the system developers and framework developers." -- Finally. When a library or OS does something, that means lots of application writers don't have to do it, and risk messing it up.
Mike DeSimone
+2  A: 

Another approach to this may be to use a different concurrency model rather than avoid multithreading altogether (you have to utilize all these CPU cores in parallel somehow).

Take a look at mechanisms used in Clojure (e.g. agents, software transactional memory).

Alex B
A: 

Threading is not the only model of concurrency. The actors model (Erlang, Scala) is an example of a somewhat different approach.

http://www.scala-lang.org/node/242

Adrian Panasiuk
+1  A: 

If your task is really, really easily isolatable, you can use processes instead of threads, like Chrome does for its tabs.

Otherwise, inside a single process, there is no way to achieve real parallelism without threads, because you need at least two coroutines if you want two things to happen at the same time (assuming you're having multiple processors/cores at hand, of course; otherwise real parallelism is simply not possible).

The complexity of threading a program is always relative to the degree of isolation of the tasks the threads will perform. There's no trouble in running several threads if you know for sure these will never use the same variables. Then again, multiple high-level constructs exist in modern languages to help synchronize access to shared resources.

It's really a matter of application. If your task is simple enough to fit in some kind of high-level Task object (depends on your development platform; your mileage may vary), then using a task queue is your best bet. My rule of the thumb is that if you can't find a cool name to your thread, then its task is not important enough to justify a thread (instead of task going on an operation queue).

zneak
+1  A: 

Software Transactional Memory (STM) is a good alternative concurrency control. It scales well with multiple processors and do not have most of the problems of conventional concurrency control mechanisms. It is implemented as part of the Haskell language. It worths giving a try. Although, I do not know how this is applicable in the context of SQLite.

Szere Dyeri
Certainly, but STM isn't a way to _avoid_ threads - rather, a way to safely access shared state when you _do_ use threads.
romkyns
romkyns: If the linked paper is any indication, the SQLite folks' objections to threads have to do with the huge problems with threads *as a programming model* --particularly Java-style, shared-everything threads: the total lack of isolation by default, proneness to latent bugs, and on and on. STM doesn't have those problems. (It has other problems.)
Jason Orendorff
+2  A: 

Alternatives to threads:

  • coroutines
  • goroutines
  • mapreduce
  • workerpool
  • apple's grand central dispatch+lambdas
  • openCL
  • erlang

(interesting to note that half of those technologies were invented or popularised by google.)

Another thing is many web frameworks transparently use multiple threads/processes for handling requests, and usually in such a way that mostly eliminates the problems associated with multithreading (for the user of the framework), or at least makes the threading rather invisible. The web being stateless, the only shared state is session state (which isn't really a problem since by definition, a single session isn't going to be doing concurrent things), and data in a database that already has its multithreading nonsense sorted out for you.

It's somewhat important to note though that these are all abstractions. The underlying implementations of these things still use threads. But this is still incredibly useful. In the same way you wouldn't use assembler to write a web application, you wouldn't use threads directly to write any important application. Designing an application to use threads is too complicated to leave for a human to deal with.

Breton
If I could accept two answers, this would be one of the two. Can't really say one is better than the other, I think they're both good and complement each other. The other answer "won" because it had less votes and thus too little visibility IMO.
romkyns
+1  A: 

Threads give you the opportunity to do some evil things, specifically sharing state among different execution paths. But they offer a lot of convenience; you don't have to do expensive communication across process boundaries. Plus, they come with less overhead. So I think they're perfectly fine, used correctly.

I think the key is to share as little data as possible among the threads; just stick to synchronization data. If you try to share more than that, you have to engage in complex code that is hard to get right the first time around.

Jacob
Now if only I could tell my compiler which code could run on different threads and have it prevent me sharing any state other than the state explicitly marked as shareable... Currently there's no way to enforce your key suggestion other than by being very, very careful (not in any mainstream language that I know of, anyway)
romkyns
I also wish there was better mainstream language support for this. Encapsulating the threaded task as a class that cannot see and doesn't expose any shared state helps, but you're right that it'd be better if enforced.
Jacob
A: 

One method of avoiding threads is multiplexing - in essence you make a lightweight mechanism similar to threads which you manage yourself.

Thing is this is not always viable. In your case the 30s polling time per website - can it be split into 60 0.5s pieces, in between which you can stuff calls to the UI? If not, sorry.

Threads aren't evil, they are just easy to shoot your foot with. If doing Query A takes 30s and then doing Query B takes another 30s, doing them simultaneously in threads will take 120s instead of 60 due to thread overhead, fighting for disk access and various bottlenecks.

But if Operation A consists of 5s of activity and 55 seconds of waiting, mixed randomly, and Operation B takes 60s of actual work, doing them in threads will take maybe 70s, compared to plain 120 when you execute them in sequence.

The rule of thumb is: threads should idle and wait most of the time. They are good for I/O, slow reads, low-priority work and so on. If you want performance, use multiplexing, which requires more work but is faster, more efficient and has way less caveats. (synchronizing threads and avoiding race conditions is a whole different chapter of thread headaches...)

SF.
The polling time consists almost entirely of waiting for websites to respond, so threads fit quite nicely, and multiplexing would work too.
romkyns
+3  A: 

I finally got around to reading the paper. Where do I start?

The author is singing an old song, which goes something like this: "If you can't prove the program is correct, we're all doomed!" It sounds best when screamed loudly accompanied by over modulated electric guitars and a rapid drum beat. Academics started singing that song when computer science was in the domain of mathematics, a world where if you don't have a proof, you don't have anything. Even after the first computer science department was cleaved from the mathematics department, they kept singing that song. They are singing that song today, and nobody is listening. Why? Because the rest of us are busy creating useful things, good things out of software that can't be proved correct.

The presence of threads makes it even more difficult to prove a program correct, but who cares? Even without threads, only the most trivial of programs can be proved correct. Why do I care if my non-trivial program, which could not be proved correct, is even more unprovable after I use threading? I don't.

If you weren't sure the author was living in an academic dreamworld, you can be sure of it after he maintains that the coordination language he suggests as an alternative to threads could best be expressed with a "visual syntax" (drawing graphs on the screen). I've never heard that suggestion before, except every year of my career. A language that can only be manipulated by GUI and does not play with any of the programmer's usual tools is not an improvement. The author goes on to cite UML as a shining example of a visual syntax which is "routinely combined with C++ and Java." Routinely in what world?

In the mean time, I and many other programmers go on using threads without all that much trouble. How to use threads well and safely is pretty much a solved problem, as long as you don't get all hung up on provability.

Look. Threading is a big kid's toy, and you do need to know some theory and usage patterns to use them well. Just as with databases, distributed processing, or any of the other beyond-grade-school devices that programmers successfully use every day. But just because you can't prove it correct doesn't mean it's wrong.

Wayne Conrad