views:

20

answers:

1

I have a .NET app that will be spawning tasks to run on an MS HPC cluster. We're not using any of that fancy DryadLINQ stuff, just remotely executing an exe on the cluster and passing arguments via the command line. The task will be .NET code, and I'd like the calling app to get an actual Exception object when an error occurs on HPC.

What's the best general technique for accomplishing this?

Let me know if you need any more info.

Thanks!

A: 

You can't pass the exception back from your executable to the client HPC app when you're using the batch scheduler. If it's good enough to know that one of the tasks or jobs that you queued failed, then you can hold onto a SchedulerJob object and add a callback to the OnJobState or OnTaskState event. Whenever your job (or a task in that job) changes state you'll get the jobid/taskid and state change information in your callback; then you can check if the state was changed to "Failed" and act on that information.

To mark a task or job as "Failed", have your executable exit with a non-zero exit code. If you need details on the actual exception, the best you can do is print it to stdout.

If you really need all the exception details, an alternative might be to use the SOA framework for your computations. Advantages would be:

  • your compute requests look like WCF method calls

  • you get detailed exceptions back when your code throws

  • you can use the SOA debugger extension to Visual Studio to debug your code

Disadvantages would be:

  • More complex to write and deploy your app starting from your existing code base.

Here are some resources to get you started (a search for "Windows HPC SOA" should get you much more):

MSDN SOA documentation

joXn