Skip to content

Submitting Jobs to the DSH HPC Cluster

Generally, to submit a job to the DSH HPC Cluster you must be on a cluster login node, create a jobscript telling the cluster what you want it to do, and submit the jobscript to the job queue using the qsub command. The scheduler will then allocate your job to a compute node to run, and your results will be placed in your working directory once the job is complete.

How do I submit a job to the scheduler?

To submit a non-interactive (a.k.a "batch") job to the DSH HPC Cluster, you need to create a jobscript that contains instructions for the scheduler to follow, including resources that you want to request and the actual commands that you want to run. For best results, jobscripts should begin with #!/bin/bash -l to run using a login shell, which allows them to include your familiar login environment and packages.

A sample job script is provided automatically when your DSH HPC Cluster home space is created, and can be found at ~/helloWorld.sh. (If this file is not present in your cluster home or you would like to obtain a fresh copy, you can also find it in the shared /apps space at /apps/sample_job_script/helloWorld.sh.)

Generally, jobscripts will use something like the following structure (see our section on Example Jobscripts for more examples):

#!/bin/bash -l

# Give the job a name (optional)
#$ -N HelloWorld

# Request some resources
#$ -l h_vmem=2G
#$ -l h_rt=0:5:0

# Set a working directory
#$ -wd /hpchome/USERID@IDHS.UCL.AC.UK/myproject

##### Job starts here #####
# List commands that you want to run here
date
python mypythonscript.py
R CMD BATCH --no-save test.R
##### Job ends here #####

This job script should then be submitted to the cluster job queue using the qsub command.

qsub myjobscript.sh

The job will be put in to the job queue, and it will begin running on compute nodes once the scheduler is able to allocate your requested resources.

Additional information can be found here: - Scheduler fundamentals (moodle) (UCL users) - Scheduler fundamentals (mediacentral) (non-UCL users)

Asking for resources

Number of cores

#$ -pe smp <number of cores>

For single core jobs you don't need to request a number of cores.

Memory requests (amount of RAM per core)

The memory you request is always per core, not the total amount. If you ask for 128GB RAM and 4 cores, that may run on 4 nodes using only one core per node. This allows you to have sparse process placement when you do actually need that much RAM per process.

Each of the compute nodes in the DSH HPC Cluster have 128 GB of RAM standard, with approximately 8 GB required for overall system usage, leaving ~120 GB of usable RAM per node. If you want to avoid sparse process placement and your job taking up more nodes than you were expecting, the maximum memory request you should make when using all the cores in a standard node is ~120/16 = ~7.5G.

#$ -l h_vmem=<integer amount of RAM in G or M>

e.g. #$ -l h_vmem=4G requests 4 gigabytes of RAM per core.

Run time

#$ -l h_rt=<hours:minutes:seconds>

e.g. #$ -l h_rt=48:00:00 requests 48 hours.

Working directory

Your working directory is the path that the job will use as the default for finding files and writing outputs. This can be specified in your jobscript either as a particular working directory:

#$ -wd /path/to/working/directory

Or the current working directory that the script was submitted from:

#$ -cwd

GPUs

To request a GPU for your job, use the -l gpu directive.

#$ -l gpu

Passing in qsub options on the command line

The #$ lines in your jobscript are options supplied to the qsub command. It will take each line which starts with #$ and use the contents beyond that as an option.

You can also pass options directly to the qsub command, and this will override the settings in your script. This can be useful if you are scripting your job submissions in more complicated ways.

For example, if you want to change the name of the job for this one instance of the job you can submit your script with:

qsub -N NewName myscript.sh

Or if you want to increase the wall-clock time to 24 hours:

qsub -l h_rt=24:0:0 myscript.sh

You can submit jobs with dependencies by using the -hold_jid option. For example, the command below submits a job that won't run until job 12345 has finished:

qsub -hold_jid 12345 myscript.sh

Note that for debugging purposes, it helps us if you have these options inside your jobscript rather than passed in on the command line whenever possible. We (and you) can see the exact jobscript that was submitted for every job that ran but not what command line options you submitted it with.

Command Action
qsub myscript.sh Submit the script as-is
qsub -N NewName myscript.sh Submit the script but change the job's name
qsub -l h_rt=24:0:0 myscript.sh Submit the script but change the maximum run-time
qsub -hold_jid 12345 myscript.sh Submit the script but make it wait for job 12345 to finish

How do I monitor a job?

qstat

The qstat command shows the status of your jobs. By default, if you run it with no options, it shows only your jobs (and no one else’s). This makes it easier to keep track of your jobs.

The output will look something like this:

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
123454 2.00685 DI_m3      ccxxxxx      Eqw   10/13/2017 15:29:11                                    12 
123456 2.00685 DI_m3      ccxxxxx      r     10/13/2017 15:29:11 all.q@dsh-sge2gpu01.IDHS.UCL.A     16 
123457 2.00398 DI_m2      ucappka      qw    10/12/2017 14:42:12                                    1 

This shows you the job ID, the numeric priority the scheduler has assigned to the job, the name you have given the job, your username, the state the job is in, the date and time it was submitted at (or started at, if it has begun), the head node of the job, the number of 'slots' it is taking up, and if it is an array job the last column shows the task ID.

The queue name (all.q here) is generally not useful. The head node name (dsh-sge2gpu01) can be a useful reference in troubleshooting when something goes wrong.

If you want to get more information on a particular job, note its job ID and then use the -f and -j <job-ID> flags to get full output about that job. Most of this information is circumstantial and may not be very useful most of the time, but can be very helpful in determining what's wrong if things are not working as expected.

qstat -f -j 12345

Job states

  • qw: queueing, waiting
  • r: running
  • Rq: a pre-job check on a node failed and this job was put back in the queue
  • Rr: this job was rescheduled but is now running on a new node
  • Eqw: there was an error in this jobscript. This will not run.
  • t: this job is being transferred
  • dr: this job is being deleted

Many jobs cycling between Rq and Rr generally means there is a dodgy compute node which is failing pre-job checks, but is free so everything tries to run there. In this case, let us know and we will investigate.

If a job stays in t or dr state for a long time, the node it was on is likely to be unresponsive - again let us know and we'll investigate.

A job in Eqw will remain in that state until you delete it - you should first have a look at what the error was with qexplain.

Why is my job in Eqw status?

If your job goes straight into Eqw state, there was an error in your jobscript that meant your job couldn't be started. The standard qstat job information command will give you a truncated version of the error:

qstat -j <job_ID>

The most common reason jobs go into this error state is that a file or directory your job is trying to use doesn't exist. Creating it after the job is in the Eqw state won't make the job run: it'll still have to be deleted and re-submitted.

Job deletion

Use qdel to delete a submitted job. You must give the job ID.

qdel 361829

You can also delete all the jobs from a specific user with the following:

qdel -u <username>

To delete a batch of jobs, you can create a file with the list of job IDs that you would like to delete and use that file's name in the following to delete the list of jobs: cat <filename> | xargs qdel

More scheduler commands

The command qacct -j <job_ID> is useful in determining what happened with a job after it has completed. This can show you information such as how long the job ran for, how much memory it used over the lifetime of the job, the maximum memory that was used at any one point, and the exit status (which can be very useful if something went wrong).

For more details on the SGE commands you can have a look at their manual pages, such as man qstat. (You can also browse the manual pages outside of the DSH on publicly available websites, such as gridscheduler.sourceforge.net.)

How do I run interactive jobs?

If you wish to run interactive programs on the cluster, use the commands qlogin or qrsh to request an interactive session. This will be queued in the same manner as other jobs, and the scheduler will open a remote command-line session on a suitable node when it is able. See our detailed guide to running interactive jobs for more information.

How do I estimate what resources to request in my jobscript?

It can be difficult to know where to start when estimating the resources your job will need. One way you can find out what resources your jobs need is to submit one job which requests far more than you think necessary, and gather data on what it actually uses. If you aren't sure what 'far more' entails, request the maximum wallclock time and job size that will fit on one node, and reduce this after you have some idea. In the case for array jobs, each job in the array is treated independently by the scheduler and are each allocated the same resources as are requested. For example, in a job array of 40 jobs requesting for 24 hours wallclock time and 3GB ram, each job in the array will be allocated 24 hours wallclock time and 3GB ram. Wallclock time does not include the time spent waiting in the queue.

In your job sript, you can try running your program using the following:

/usr/bin/time --verbose myprogram myargs

where myprogram myargs is however you normally run your program, with whatever options you pass to it.

When your job finishes, you will get output about the resources it used and how long it took - the relevant one for memory is maxrss (maximum resident set size) which roughly tells you the largest amount of memory it used.

Remember that memory requests in your jobscript are always per core, so check the total you are requesting is sensible -- if you increase it too much you may end up with a job that cannot be submitted.

How do I run a graphical program?

Unfortunately, at the moment it is not possible to run a graphical program in the DSH HPC cluster.

What can I do to minimise the time I need to wait for my job(s) to run?

  1. Minimise the amount of wall clock time you request.
  2. Use job arrays instead of submitting large numbers of jobs (see our job script examples).
  3. Plan your work so that you can do other things while your jobs are being scheduled.