I've never found them to be very similar. Let me define for the purposes of this post a "node" to be one hardware thread running on one machine. So a quad core machine is four nodes, as is a cluster of four single processor boxes.
Each node will typically be running some processing, and there will need to be some type of cross-node communication. Usually the first instance of this communication is telling the node what to do. For this communication, I can use shared memory, semaphores, shared files, named pipes, sockets, remote procedure calls, distributed COM, etc. But the easiest ones to use, shared memory and semaphores, are not typically available across a network. Shared files may be available, but performance is typically poor. Sockets tend to be the most common and most flexible choice over a network, rather than the more sophisticated mechanisms. At that point you have to deal with the details of network architecture, including latency, bandwidth, packet loss, network topology, and more.
If you start with a queue of work, nodes on the same machine can use simple shared memory to get things to do. You can even write it up lockless and it will work seamlessly. With nodes over a network, where do you put the queue? If you centralize it, that machine may suffer very high bandwidth costs. Try to distribute it and things get very complicated very quickly.
What I've found, in general, is the people tackling this type of parallel architecture tend to choose embarrassingly parallel problems to solve. Raytracing comes to mind. There's not much cross-node communication required, apart from job distribution. There are many problems like this, to be sure, but I find it a bit disingenuous to suggest that distributed computing is essentially the same as threading.
Now if you're going to go write threading that behaves identically to a distributed system, using pure message passing and not assuming any thread to be the "main" one and such, then yes, they're going to be very similar. But what you've done is pretended you have a distributed architecture and implemented it in threads. The thing is that threading is a much simpler case of parallelism than true distributed computing is. You can abstract the two into a single problem, but by choosing the harder version and sticking strictly to it. And the results won't be as good as they could be when all of the nodes are local to a machine. You're not taking advantage of the special case.