views:

199

answers:

6

Hello all.

I am a computer science undergraduate currently in my final year. As my final year project, I am thinking of creating a matlab-like numerical computing environment as SAAS that supports matrix manipulations, plotting of functions and data, image processing operations etc. The project is going to be created in Java + Scala. Scala will be used for application's DSL. Rest of the application is going to be programmed in Java.

I was thinking of implementing this system on google app engine so that we could parallelize various algorihms across a number of servers and thus obtain faster results. However I do not have any prior experience with web development (except some simple sites in PHP).

So I had the following key questions:

  1. First of all does it make sense to have an application like matlab hosted on cloud?
  2. How easy or difficult it would be to write such an application over google app engine, considering my limited experience with web development?
  3. Can you please point me to some already existing projects that parallelize mathematical, graph and image processing algorithms.

I know the question is very much subjective but I still request you all not to close it as I am very much confused regarding my project and need some expert advice.

Any hep would be greatly appreciated!

Thanks!

+2  A: 

Why not try BOINC opensource distributed computing system ?

http://boinc.berkeley.edu/

It allows multiple platforms, multiple hosting environments and services all kind of numerical computation jobs depending on parallel environments.

Moreover, You don't need any web development knowledge. You need to just create a new project in BOINC and try running it in existing volunteer computing environment.

vprajan
BOINC, to the best of my knowledge, is targeted at large batch jobs, but the OP is asking about interactive or near-interactive usage.
Nick Johnson
+2  A: 

You might encounter issues with this type of service on GAE as it's quite restrictive on what you are allowed to do in the sandbox. From the GAE Docs

An App Engine application cannot:

  • spawn a sub-process or thread. A web request to an application must be handled in a single process within a few seconds. Processes that take a very long time to respond are terminated to avoid overloading the web server.

This could make it tricky to offer the types of services you describe. The scaling that GAE offers enables you to grow the number of requests you can handle but doesn't really offer you good tools for scaling the CPU resources for a single request.

Sounds like an interesting idea for a project though, good luck.

Jon McAuliffe
+4  A: 

About half a year ago I've thought about making such thing.

Thoughts ended up with nothing except some code at http://code.google.com/p/metaplasm...

In fact, the tricky thing with GAE is that computation must be sliced into thirty secods slices with no shared memory (only memcache and database). After you're accomplish that, everything else will go smooth :-)

Vanya
+3  A: 

App Engine probably isn't the right platform for this. App Engine is targeted at web applications where each request does a modest amount of computation, but you need to service a lot of them - most traditional webapps, such as social networking sites, blogs, web-based games, and so on and so forth. It isn't targeted at services that need to do intensive computation for a single user request, and while it has services to do parallel background processing, they're asynchronous, which is probably also not what you want for your use-case.

What I would recommend is looking at other cloud environments, such as Amazon's EC2, for the processing power and parallelism you need. App Engine would still do an admirable job as a frontend for such a service, though! For example, you could use an App Engine app to manage jobs, dispatch them to backends, and turn up and down VM instances as required by load.

Nick Johnson
+3  A: 

This absolutely makes sense, and there are two existing projects that run numerical routines in the cloud.

Biocep (free, runs R & Scilab on EC2 or Eucalyptus) and Monkey Analytics (commercial, runs R, Octave or Python on EC2).

Richie Cotton
Star Cluster is another one http://web.mit.edu/stardev/cluster/ (includes Python + NumPy/SciPy optmized for EC2 + management tools)
thrope
+1  A: 

It makes little sense to me to write the rest in Java. That's precisely where I think Scala would make the most difference.

Daniel
Why? What are pros and cons?
Mikhail