views:

263

answers:

2

Does anybody know a Java implementation of the DRMAA-API that is known to work with PBS/Torque cluster software?

The background behind this: I would like to submit jobs to a newly set-up linux cluster from Java using a DRMAA compliant API. The cluster is managed by PBS/Torque. Torque includes PBS DRMAA 1.0 library for Torque/PBS that contains a DRMA-C binding and provides in libdrmaa.so and .a binaries. I know that Sun grid engine includes a drmaa.jar providing a Java-DRMAA API. In fact I opted to use SGE but it was decided to try PBS first.

The theory behind that decision was:
'DRMAA is a standard and therefore a Java API needs only a standards compliant drmaa-c binding.' However, I couldn't find such 'general DRMAA-C-java API' and now assume that this assumption is wrong and that the Java libraries are engine specific.

Edit: I just experimented with the drmaa.jar from sun grid engine package and tried to cross-use it with the pbs libdrmaa.so. Not surprisingly, that failed (JNI unsatisfied link error).

Conclusion: It does not work that way! After some search I see only these few options:

  1. Install GridWay ontop of Globus toolkit. Installed ontop of PBS, GridWay claims to provide DRMAA in Java. Looks far too complex for my setting.
  2. Scrap DRMAA, submit to PBS by calling system command qsub, qstat, etc. from Java. Simple but not so nice.
  3. Implement a drmaa binding myself. Way too complex...

  4. Switch to Grid Engine. GE in my opinion is superior over PBS with respect to language bindings.

I tend to prefer option 2. or 4. Any recommendations?

+1  A: 

After some more searching it looks like I have to write something myself. There seems to be not optimal answer yet, but it can serve as a warning for those attempting the same.

The best place to ask these questions is possibly the Torque mailing list: www.clusterresources.com/resources/mailing-lists.php

First of all, the reason why you cannot just use any DRMAA-Java library and use it with any DRMAA-C implementation is: DRMAA describes the interface of the resource control, not how it is implemented. The vendor could use a DRMAA-C implementation and use only these functions, but they do not have to. It can use whatever is there in the engine. So one important message is: if you need certain language bindings, make sure they are there for all languages required.

Regarding the options mentioned:

  1. Using GridWay/Globus Toolkit: http://www.gridway.org/doku.php?id=start Advantage: Gridway is a meta scheduler that supports many resource management systems (SGE, PBS,...). Possibly, the only way to get a DRMAA interface to work with PBS at the moment. Disadvantage: It seems like an inflation of layers and complexity. Have no experience with that.

  2. Using system commands, qsub, qstat, qdel. Advantage: quick hack Disadvantages: dirty hack, need to implement parsers for the output, might not notice if something goes wrong, pass around messages from stdin/stdout/stderr, not portable

  3. Using JNI it should be possible to create a binding for each c-function in drmaa.c Advantage: would provide a full drmaa implementation (hopefully) Disadvanteges: involves compiled code, lot of manual wrapping of C-functions (maybe this can be automated)

  4. Switch to another grid engine. Possibly, we should have done this analysis before. However, we already have one other Torque cluster, and there is experience with this. Operating two would create more heterogeneous infrastructure.

  5. Changing an existing drmaa library from a different vendor. No idea if that is feasible... We will look into that too.

Michael
A: 

Did you ever decide what to do with this? Did you manage to get Java DRMAA bindings working with Torque/PBS? I'm looking to get some Java DRMAA code working on a Torque/PBS system, and if you've done the hard work already, I'd love to steal it.

However, if you haven't, it shouldn't be too bad to make some Java bindings, and I will do it if nobody else does. Several years ago I successfully modified the DRMAA Java bindings for SGE to work with a new DRMAA implementation for Xgrid (now stale, but perhaps soon to be revived).

I even wrote a brief blog post on my experiences (includes a link to general instructions):

http://edbaskerville.com/2006/07/11/java-bindings-working/

Ed Baskerville
Hi, sorry we finally decided to switch to (Oracle/sun) open-source version GridEngine. It just turned out to be the more complete system.
Michael
Makes sense. The code I'm porting is for the Sun Grid Engine (possibly soon to be Oracle closed-source), which is where the Java bindings originated. I'll forge ahead with the Torque/PBS bindings and post here when/if I get them working.
Ed Baskerville