Hmmm. You (the original poster) and the other answers are, I think, coming at this backwards.
You seem to grasp the event-driven part, but are getting hung up on what happens after an event fires.
The key thing to understand is that a web server generally spends very little time "processing" a request, and a whole lot of time waiting for disk and network I/O.
When a request comes in, there are generally one of two things that the server needs to do. Either load a file and send it to the client, or pass the request to something else (classically, a CGI script, these days FastCGI is more common for obvious reasons).
In either case, the server's job is computationally minimal, it's just a middle-man between the client and the disk or "something else".
That's why these servers use what is called non-blocking I/O.
The exact mechanisms vary from one operating system to another, but the key point is that a read or write request always returns instantly (or near enough). When you try to write, for example, to a socket, the system either immediately accepts what it can into a buffer, or returns something like an EWOULDBLOCK error letting you know it can't take more data right now.
Once the write has been "accepted", the program can make a note of the state of the connection (e.g. "5000 of 10000 bytes sent" or something) and move on to the next connection which is ready for action, coming back to the first after the system is ready to take more data.
This is unlike a normal blocking socket where a big write request could block for quite a while as the OS tries to send data over the network to the client.
In a sense, this isn't really different from what you might do with threaded I/O, but it has much reduced overhead in the form of memory, context switching, and general "housekeeping", and takes maximum advantage of what operating systems do best (or are supposed to, anyway): handle I/O quickly.
As for multi-processor/multi-core systems, the same principles apply. This style of server is still very efficient on each individual CPU. You just need one that will fork multiple instances of itself to take advantage of the additional processors.