views:

382

answers:

5

Hey all,

I just watched the following video: Introduction to Node.js and still don't understand how you get the speed benefits.

Mainly, at one point Ryan Dahl (Node.js' creator) says that Node.js is event-loop based instead of thread-based. Threads are expensive and should only be left to the experts of concurrent programming to be utilized.

Later, he then shows the architecture stack of Node.js which has an underlying C implementation which has it's own Thread pool internally. So obviously Node.js developers would never kick off their own threads or use the thread pool directly...they use async call-backs. That much I understand.

What I don't understand is the point that Node.js still is using threads...it's just hiding the implementation so how is this faster if 50 people request 50 files (not currently in memory) well then aren't 50 threads required?

The only difference being that since it's managed internally the Node.js developer doesn't have to code the threaded details but underneath it's still using the threads to process the IO (blocking) file requests.

So aren't you really just taking one problem (threading) and hiding it while that problem still exists: mainly multiple threads, context switching, dead-locks...etc?

There must be some detail I still do not understand here.

Thanks!

-Ralph

A: 

I know nothing about the internal workings of node.js, but I can see how using an event loop can outperform threaded I/O handling. Imagine a disc request, give me staticFile.x, make it 100 requests for that file. Each request normally takes up a thread retreiving that file, thats 100 threads.

Now imagine the first request creating one thread that becomes a publisher object, all 99 other requests first look if there's a publisher object for staticFile.x, if so, listen to it while it's doing it's work, otherwise start a new thread and thus a new publisher object.

Once the single thread is done, it passes staticFile.x to all 100 listeners and destroys itself, so the next request creates a fresh new thread and publisher object.

So it's 100 threads vs 1 thread in the above example, but also 1 disc lookup instead of 100 disc lookups, the gain can be quite phenominal. Ryan is a smart guy!

Another way to look at is is one of his examples at the start of the movie. Instead of:

pseudo code:
result = query('select * from ...');

Again, 100 seperate queries to a database versus...:

pseudo code:
query('select * from ...', function(result){
    // do stuff with result
});

If a query was already going, other equal queries would simply jump on the bandwagon, so you can have 100 queries in a single database roundtrip.

BGerrissen
The database thing is more a question of not waiting for the answer while holding up other requests (which may or may not use the database), but rather ask for something and then let it call you when it gets back. I don't think it links them together, as that would be quite difficult to keep track of on response. Also i don't think there's any MySQL interface that lets you hold multiple unbuffered responses on one connection (??)
Tor Valamo
It's just an abstract example to explain how event loops can offer more efficiency, nodejs does nothing with DB's without extra modules ;)
BGerrissen
Yeah my comment was more towards the 100 queries in a single database roundtrip. :p
Tor Valamo
+3  A: 

What I don't understand is the point that Node.js still is using threads.

Ryan uses threads for that parts that are blocking(Most of node.js uses non-blocking IO) because some parts are insane hard to write non blocking. But I believe Ryan wish is to have everything non-blocking. On slide 63(internal design) you see Ryan uses libev(library that abstracts asynchronous event notification) for the non-blocking eventloop. Because of the event-loop node.js needs lesser threads which reduces context switching, memory consumption etc.

Alfred
+7  A: 

There are actually a few different things being conflated here. But it starts with the meme that threads are just really hard. So if they're hard, you are more likely, when using threads to 1) break due to bugs and 2) not use them as efficiently as possible. (2) is the one you're asking about.

Think about one of the examples he gives, where a request comes in and you run some query, and then do something with the results of that. If you write it in a standard procedural way, the code might look like this:

result = query( "select smurfs from some_mushroom" );
// twiddle fingers
go_do_something_with_result( result );

If the request coming in caused you to create a new thread that ran the above code, you'll have a thread sitting there, doing nothing at all while while query() is running. (Apache, according to Ryan, is using a single thread to satisfy the original request whereas nginx is outperforming it in the cases he's talking about because it's not.)

Now, if you were really clever, you would express the code above in a way where the environment could go off and do something else while you're running the query:

query( statement: "select smurfs from some_mushroom", callback: go_do_something_with_result() );

This is basically what node.js is doing. You're basically decorating -- in a way that is convenient because of the language and environment, hence the points about closures -- your code in such a way that the environment can be clever about what runs, and when. In that way, node.js isn't new in the sense that it invented asynchronous I/O (not that anyone claimed anything like this), but it's new in that the way it's expressed is a little different.

Note: when I say that the environment can be clever about what runs and when, specifically what I mean is that the thread it used to start some I/O can now be used to handle some other request, or some computation that can be done in parallel, or start some other parallel I/O. (I'm not certain node is sophisticated enough to start more work for the same request, but you get the idea.)

jrtipton
exactly! The performance of node.js isnt't due to it's event based loop or some asynchronous io, the callback system which drastically minifies waiting time is the core of the node.js performance.
Tobias P.
Okay, I can definitely see how this can increase performance because it sounds to me like you are able to max out your CPU because there isn't any threads or execution stacks just waiting around for IO to return so what Ryan has done is effectively found a way to close all the gaps.
Ralph
Yeah, the one thing I'd say is that it's not like he found a way to close the gaps: it's not a new pattern. What's different is that he is using Javascript to let the programmer express their program in a way that is much more convenient for this kind of asynchrony. Possibly a nitpicky detail, but still...
jrtipton
+1  A: 

It is using threads because:

  1. The O_NONBLOCK option of open() does not work on files.
  2. There are third-party libraries which don't offer non-blocking IO.

To fake non-blocking IO, threads are neccessary: do blocking IO in a separate thread. It is an ugly solution and causes much overhead.

It's even worse on the hardware level:

  • With DMA the CPU asynchronously offloads IO.
  • Data is transferred directly between the IO device and the memory.
  • The kernel wraps this in a synchronous, blocking system call.
  • Node.js wraps the blocking system call in a thread.

This is just plain stupid and inefficient. But it works at least! We can enjoy Node.js because it hides the ugly and cumbersome details behind an event-driven asynchronous architecture.

Maybe someone will implement O_NONBLOCK for files in the future?...

Edit: I discussed this with a friend and he told me that an alternative to threads is polling with select: specify a timeout of 0 and do IO on the returned file descriptors (now that they are guaranteed not to block).

nalply
A: 

Threads are used only to deal with functions having no asynchronous facility, like stat().

The stat() function is always blocking, so node.js needs to use a thread to perform the actual call without blocking the main thread (event loop). Potentially, no thread from the thread pool will ever be used if you don't need to call those kind of functions.

gawi