sungridengine

Error while opening shared object: SunGrid Engine

Hi all, My application uses the Sun N1 grid engine through the API DRMAA present as shared object libdrmaa.so . I am using dlopen and dlsym to acess functions of the library. That works fine. Now if I try to link it form command line the executable is built but executing it gives the error " Cannot open shared object file". Can anyone s...

Problem in using C dynamic loading routines

Hi all I have an application consisting of different modules written in C++. One of the modules is meant for handling distributed tasks on SunGrid Engine. It uses the DRMAA API for submitting and monitoring grid jobs.If the client doesn't supports grid, local machine should be used The shared object of the API libdrmaa.so is linked at c...

Troubleshooting SIGTERMs with tee on a cluster within SGE jobs

I have some legacy scientific code running on a Rocks cluster, with SGE. I have an application-specific job submission script that generates qsub scripts (i.e. the script which Sun Grid Engine takes and runs). Within the qsub script, my legacy app is called. This app sends it's output to STDOUT. SGE intercepts STDOUT and spools it into ...

Condor, Sun Grid Engine, or something else?

I'm trying to work out whether we should try out Condor or Sun Grid Engine at work (or possibly something else). We often have lots of unused WinXp workstations. The hope is that we could use wake-on-LAN, run all our jobs, and then shut down automatically. We'd mainly be running Matlab, Java or Python simulations for either monte-carl...

how to change the default shell for the sun grid engine

The sun grid engine defaults to csh, and you have to put this: #$ -S /bin/sh into scripts to avoid it. What global configuration setting would change this default? ...

MPI , Sungrid vs JPPF ?

Greetings , I have a little experience with SungridEngine and MPI (using OpenMPI). Whats the different between these frameworks/API and JPPF ? ...

SunGridEngine, Condor, Torque as Resource Managers for PVM

Anyone have any idea which Resource manager is good for PVM? Or should I not have used PVM and instead relied on MPI (or any version of it, such as MPICH-2 [are there any other ones that are better?]). Main reason for using PVM was because the person before me who started this project assumed the use of PVM. However, now that this projec...

Running a lot of jobs with sun grid engine

I want to run a very large number (~30000) of jobs with Sun Grid Engine. I can theoretically, perform 30000 times the "qsub" command to submit jobs. However, I am afraid that will be too much. Is there a better way to do it? (i.e. from a file) Or otherwise, do you think it will work nonetheless? Thank you ...

Spreading a job over different nodes of a cluster in sun grid engine (SGE)

Hey, I'm tryin get sun gridending (sge) to run the separate processes of an MPI job over all of the nodes of my cluster. What is happening is that each node has 12 processors, so SGE is assigning 12 of my 60 processes to 5 separate nodes. I'd like it to assign 2 processes to each of the 30 nodes available, because with 12 processes (d...

sun grid engine network speed

about to implement sun grid engine. Although, knowing that it works fast. At times it slows down. how is it possible to speed it up or to make it run normal. Brgds, kNish ...

Help running Python multiprocessing program under SGE

I have a Python program that runs multiple threads using the multiprocessing module. The program runs fine when executed on a stand-alone machine with multiple cores, using all cores, or on a cluster when executing from the shell directly. However, when trying to run it through SGE (Sun Grid Engine), either through a job script or usin...

Timeout jobs on sun grid engine

Hello, I'm running a lot of jobs with sun grid engine (linux). Some of the jobs take a (very) long time to run, and I don't know ahead which ones. I would like to stop jobs that run for more than, say, 2 hours. Is it possible to run using SGE? Is it possible to do it from the unix shell? Thanks ...

Redirect output to different directories for sun grid engine array jobs

Hey, I'm running a lot of jobs with Sun Grid Engine. Since these are a jobs (~100000), I would like to use array jobs, which seems to be easier on the queue. Another problem is that each jobs produces an stdout and stderr file, which I need to track error. If I define them in the qsub -t 1-100000 -o outputdir -e errordir I will end up ...

Running a job on multiple nodes of a GridEngine cluster

I have access to a 128-core cluster on which I would like to run a parallelised job. The cluster uses Sun GridEngine and my program is written to run using Parallel Python, numpy, scipy on Python 2.5.8. Running the job on a single node (4-cores) yields an ~3.5x improvement over a single core. I would now like to take this to the next lev...