views:

36

answers:

1

I have access to a 128-core cluster on which I would like to run a parallelised job. The cluster uses Sun GridEngine and my program is written to run using Parallel Python, numpy, scipy on Python 2.5.8. Running the job on a single node (4-cores) yields an ~3.5x improvement over a single core. I would now like to take this to the next level and split the job across ~4 nodes. My qsub script looks something like this:

#!/bin/bash
# The name of the job, can be whatever makes sense to you
#$ -N jobname

# The job should be placed into the queue 'all.q'.
#$ -q all.q

# Redirect output stream to this file.
#$ -o jobname_output.dat

# Redirect error stream to this file.

#$ -e jobname_error.dat

# The batchsystem should use the current directory as working directory.
# Both files will be placed in the current
# directory. The batchsystem assumes to find the executable in this directory.
#$ -cwd

# request Bourne shell as shell for job.
#$ -S /bin/sh

# print date and time
date

# spython is the server's version of Python 2.5. Using python instead of spython causes the program to run in python 2.3
spython programname.py

# print date and time again
date

Does anyone have any idea of how to do this?

A: 

Yes, you need to include the Grid Engine option -np 16 either in your script like this:

# Use 16 processors
#$ -np 16

or on the command line when you submit the script. Or, for more permanent arrangements, use an .sge_request file.

On all the GE installations I've ever used this will give you 16 processors (or processor cores these days) on as few nodes as necessary, so if your nodes have 4 cores you'll get 4 nodes, if they have 8 2 and so on. To place the job on, say 2 cores on 8 nodes (which you might want to do if you need a lot of memory for each process) is a little more complicated and you should consult your support team.

High Performance Mark
Adding `~$ -np 8` comes up with an error saying `Unable to read script file because of error: ERROR! invalid option argument "-np"`. I now have a Python-only solution for this, but it would be nice to have another option.
Chinmay Kanchi
@Chinmay Kanchi: add #$ -np 16, not ~$ -np 16
High Performance Mark
Yes, that is what I had done. Missed out the `#` in the comment. Cheers.
Chinmay Kanchi