Most of the answers above talk about performance and simultaneous operation. I'm going to approach this from a different angle.
Let's take the case of, say, a simplistic terminal emulation program. You have to do the following things:
- watch for incoming characters from the remote system and display them
- watch for stuff coming from the keyboard and send them to the remote system
(Real terminal emulators do more, including potentially echoing the stuff you type onto the display as well, but we'll pass over that for now.)
Now the loop for reading from the remote is simple, as per the following pseudocode:
while get-character-from-remote:
print-to-screen character
The loop for monitoring the keyboard and sending is also simple:
while get-character-from-keyboard:
send-to-remote character
The problem, though, is that you have to do this simultaneously. The code now has to look more like this if you don't have threading:
loop:
check-for-remote-character
if remote-character-is-ready:
print-to-screen character
check-for-keyboard-entry
if keyboard-is-ready:
send-to-remote character
The logic, even in this deliberately simplified example that doesn't take into account real-world complexity of communications, is quite obfuscated. With threading, however, even on a single core, the two pseudocode loops can exist independently without interlacing their logic. Since both threads will be mostly I/O-bound, they don't put a heavy load on the CPU, even though they are, strictly speaking, more wasteful of CPU resources than the integrated loop would be.
Now of course real-world usage is more complicated than the above. But the complexity of the integrated loop goes up exponentially as you add more concerns to the application. The logic gets ever more fragmented and you have to start using techniques like state machines, coroutines, et al to get things manageable. Manageable, but not readable. Threading keeps the code more readable.
So why would you not use threading?
Well, if your tasks are CPU-bound instead of I/O-bound, threading actually slows your system down. Performance will suffer. A lot, in many cases. ("Thrashing" is a common problem if you drop too many CPU-bound threads. You wind up spending more time changing the active threads than you do running the contents of the threads themselves.) Also, one of the reasons the logic above is so simple is that I've very deliberately chosen a simplistic (and unrealistic) example. If you wanted to echo what was typed to the screen then you've got a new world of hurt as you introduce locking of shared resources. With only one shared resource this isn't so much a problem, but it does start to become a bigger and bigger problem as you have more resources to share.
So in the end, threading is about many things. For example, it's about making I/O-bound processes more responsive (even if less efficient overall) as some have already said. It's also about making logic easier to follow (but only if you minimize shared state). It's about a lot of stuff, and you have to decide if its advantages outweigh its disadvantages on a case by case basis.