tags:

views:

260

answers:

7

How can I use several computers to create a faster environment? I have about 12 computers with 4GB each and 2GHz each. I need to run some time consuming data transform and would like to use the combined power of these machines. They are all running Win2003 server.

Basically we have a large number of video files that we need to transform so our analysts can do their analysis. The problem is complicated by the fact I can't tell you more about the project.

I moved it to: http://serverfault.com/questions/40615/is-it-possible-to-create-a-faster-computer-from-many-computers

+2  A: 

Yes, it's called Grid Computing or more recently, Cloud Computing.

There are many programming toolkits that are available to distribute your operations across a network. Everything from doing builds to database operations to complex mathematics libraries to special parallel programming languages.

There are solutions for every size product, from IBM and Oracle down to smaller vendors like Globus. And there are even open-source solutions, such as GridGain and NGrid (the latter is on SourceForge).

lavinio
A: 

Try looking at Condor. Their homepage is light on info, so check out the wikipedia article first.

eduffy
+1  A: 

There are a variety of tools out there that can use a distributed network of computers.

An example is Incredibuild

samoz
+1 for incredibuild! Love that one :D Don't they have a product for generic computation now as well?
Byron Whitlock
+1  A: 

You will get speed only if you can split the job and run it on multiple computers parallely. Can you do that with your data transform program? One of the things I am aware of is that Amazon supports map-reduce. If you can express your data transform problem as a map-reduce problem, you can potentially leverage Amazon's cloud based Hadoop service.

Charles Prakash Dasari
+1  A: 

There's really no "out of the box" way to just combine multiple computers into one big computer in a generic way like that.

The idea here is distributed computing, and you would have to write a program (possibly using an existing framework) that would essentially split your data transform into smaller chunks, send those off to each of the other computers to process, then aggregate the results.

Whether this would work or not would depend on the nature of your problem - can it be split into multiple chunks that can be worked on independently or not?

If so, there are several existing frameworks out there that you could use to build such an application. Hadoop for example, which uses Map-Reduce would be a good place to start.

Eric Petroelje
+1  A: 

Your program will probably need to be modified to to take advantage of multiple machines.

One method of doing this is to use an implementation of MPI (possibly MSMPI as you're using Windows Server)

Nick
+1  A: 

Imagine a Beowulf cluster of these things!

Byron Whitlock