views:

271

answers:

6

How similar is distributed computing and threading? I've found two papers coming to quite opposite conclusions:

"Multi-Threading is Easier Than Networking. How threading is easy and similar to network code"

http://software.intel.com/file/14723

(this gives me an impression that they're so similar that after encapsulation these two approaches could be done with the same code - but maybe I'm wrong)

"A note on distributed computing"

http://research.sun.com/techrep/1994/abstract-29.html

(and this puts a strong distinction)

I'm sure the truth is somewhere in between. What's the golden mean? Are there any technologies that unify those two paradigms? Or have such attempts failed because of fundamental differences between networking and concurrency?

+1  A: 

Distributing computing is done over multiple different independent machines, generally with sometimes specialized OS's. It's harder because the interconnectedness of the machines is much lower, and therefore problems which require a lot of quick, random access to the entire dataset are very difficult to solve.

Generally speaking, you need specialized libraries to do distributed computing problems that figure out how to assign nodes to problems and cart around the data.

I really wonder if they are coming to different conclusions because they are trying to solve the wrong problems on each platform. Some problems adhere very nicely to highly interconnected machines, and can benefit from really power supercomputers. Other problems can be dealt with on simply distributed models. In general, supercomputers can solve a wider range of problems, but are much, much more specialized and expensive.

altCognito
+1  A: 

The difference seems to come back to Threads share state, Processes pass messages.

You need to decide how you want to maintain state in your app before choosing one.

Share state is easy to get started with, all the data and variables are just there. But once deadlocks/race conditions enter, its hard to modify/scale.

Message passing (eg Erlang) requires a different approach to design, you have to think about opportunities for concurrency from the beginning, but state of each distributed process is isolated, making locking/race problems easier to deal with.

Mike
+1  A: 

I think it's a lot more useful to compare processes with distributed computing approaches than it is to compare threads with it. Threads exists inside a single process and shares the same data and the same memory. This isn't possible over several machines. Processes on the other hand has a their own memory, although it in some cases contains exactly the same data as another process (after a fork(), for example). This could be achieved over a network.

Something that adds extra weight to this analogy is the fact that many tools used for inter process communication is network transparent. A good example would be unix sockets, which uses the same interface as network sockets (except for the connection code).

Emil H
Network transparency is exactly the term I was searching for."many tools used for inter process communication is network transparent" - any concrete examples?
sdcvvc
+3  A: 

I've never found them to be very similar. Let me define for the purposes of this post a "node" to be one hardware thread running on one machine. So a quad core machine is four nodes, as is a cluster of four single processor boxes.

Each node will typically be running some processing, and there will need to be some type of cross-node communication. Usually the first instance of this communication is telling the node what to do. For this communication, I can use shared memory, semaphores, shared files, named pipes, sockets, remote procedure calls, distributed COM, etc. But the easiest ones to use, shared memory and semaphores, are not typically available across a network. Shared files may be available, but performance is typically poor. Sockets tend to be the most common and most flexible choice over a network, rather than the more sophisticated mechanisms. At that point you have to deal with the details of network architecture, including latency, bandwidth, packet loss, network topology, and more.

If you start with a queue of work, nodes on the same machine can use simple shared memory to get things to do. You can even write it up lockless and it will work seamlessly. With nodes over a network, where do you put the queue? If you centralize it, that machine may suffer very high bandwidth costs. Try to distribute it and things get very complicated very quickly.

What I've found, in general, is the people tackling this type of parallel architecture tend to choose embarrassingly parallel problems to solve. Raytracing comes to mind. There's not much cross-node communication required, apart from job distribution. There are many problems like this, to be sure, but I find it a bit disingenuous to suggest that distributed computing is essentially the same as threading.

Now if you're going to go write threading that behaves identically to a distributed system, using pure message passing and not assuming any thread to be the "main" one and such, then yes, they're going to be very similar. But what you've done is pretended you have a distributed architecture and implemented it in threads. The thing is that threading is a much simpler case of parallelism than true distributed computing is. You can abstract the two into a single problem, but by choosing the harder version and sticking strictly to it. And the results won't be as good as they could be when all of the nodes are local to a machine. You're not taking advantage of the special case.

Promit
A: 

Yes at developing time the approach is very similar but the use of each is very different. I don't get your idea very clear, let me know if I'm wrong: When talking about distributed computing we are assuming more than one computer or server processing code in the same application, but when we are talking about Multi-Threading we are talking about processing different threadings of the application at the same time in the same computer. You can think as an example of distributed computing, in one application accessing a web service located in the Internet. There are two different computers working in the same app.

If you want an example of multi-threading, just think of an application trying to find one big prime number. If you don´t use multi-threading in it you won't be able to see or do anything else in the application at the time it's calculating the next prime number (can be a life time or more) because the application is not responsive while is working in the calculation.

You can mix them too: As a more complex example, you can always use multi-threading to access different web services at the same time by the same application, this is in order to make your application responsive even if is not connecting when one of the servers.

backslash17
A: 

I think those two documents cannot be easily compared. Intel's document is a sort of introduction to threading, and they try to explain it by finding analogies to network computing, which is a bit strange and misleading to me. I'm not sure why they chose such a way of presenting threading, maybe they aimed at people familiar with networking, which is probably more known or at least recognized than threading.

Sun's document, on the other hand, is a serious article, depicting all the difficulties related to distributed programming. All I can do is to simply confirm what they say therein.

In my opinion, an abstraction that attempts to hide the fact of an object being remote is harmful as it usually leads to a very bad performance. The programmer must be aware of the remoteness of an object to be able to invoke it in an efficient way.

Bartosz Klimek