views:

80

answers:

1

I need to process large image files into smaller image files. I would like to distribute the work to many "slave" servers, rather than tasking my main server with this. I am using Windows Server 2005/2008, C#, and ASP.NET. I have a lot of web application development experience but have not developed distributed systems. I had a notion that this could be designed as follows:

1) Files would be placed in a shared network drive

2) Slave servers would periodically poll the drive for new content

3) Slave servers would rename newly found files to something like UNPROCESSED_appIDXXXX_jidXXXXX_photoidXXXXX.tif and begin processing that file.

4) Other slave servers would avoid trying to process files that are in process by examining file name, i.e. if something has been named "UNPROCESSED" they will not attempt to process.

I am wondering a few things:

1) Will there be issues with two slave servers trying to "grab" and rename the file at once, or will Windows Server automatically lock the file?

2) What do you think the best mechanism for notification of new content for processing should be? One simple idea is to write a basic aspx page on each slave system and have it running on a timer. A better idea might be to write a Windows Service that utilizes SystemFileWatcher and have it running on each slave system. A third idea is to have a central server somehow dispatch instructions to a given slave server to attempt a processing job, but I do not know of ways of invoking that kind of communication beyond a very hack-ish approach of having the master server pass a message via HTTP.

I'd much appreciate any guidance you have to offer.

Cheers, -KF

A: 

If you don't want to go all the way with a compute cluster type solution. You should consider having a job manager running somewhere that will parcel out the work. That way, when a server becomes available to do work, it asks the job manager for a new bit of work to do. It can then tell the job manager that it's finished and the job manager can inform your "client" when the work on the whole job is complete. That way, it's easy to register work and know it's complete and the job manager can parcel out the work without the worry of race conditions on file renames. :)

JP Alioto
Good idea, thanks. Do you have any suggested communication mechanisms for how the "client" should communicate with the "job manager"? One way to do this while still assuring a fairly decoupled solution might be to have all of the info persisted to a single database: job assignments could go out as necessary, notification of success could be noted in the DB. You could even conceivably have multiple "job managers" referencing the data and assigning jobs as necessary...
kendor
A db for the job manager is a good idea b/c that will allow durability of jobs and traceability. Workers can communicate with the job manager through a WCF web service for example. But you should check out other grid solutions like http://ngrid.sourceforge.net/ too.
JP Alioto