Skip to content

Slurm§

We have changed schedulers from SGE to Slurm. Both are job schedulers for HPC clusters - they have different architecture, commands, and features.

Here is a comparative table of the main SGE commands and their equivalent in Slurm. (Since we had a very custom SGE install, a few commands may be slightly different or may not have been available in our past setup).

SGE and Slurm command Rosetta stone
SGE and Slurm command Rosetta stone

Full Slurm Rosetta stone of Workload Managers available as PDF

How do I submit a job to Slurm?§

You can submit a bash jobscript that has Slurm directives in it. You can also pass in Slurm directives on the command line to the Slurm submit commands.

Using a bash jobscript§

To submit a job to the scheduler you need to write a jobscript that contains the resources the job is asking for and the actual commands/programs you want to run. This jobscript is then submitted using the sbatch command:

sbatch myjobscript

Lines beginning with #SBATCH in your jobscript contain options to sbatch. Slurm will take each line starting with #SBATCH and use the contents beyond that as an instruction. The job will be put in to the queue and will begin running on the compute nodes at some point later when it has been allocated resources.

Slurm allows you to specify various options for how your job is laid out across nodes.

#!/bin/bash

# Request two nodes on Kathleen and run 40 tasks per node, one cpu each:
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=40
#SBATCH --cpus-per-task=1
#SBATCH --time=00:10:00

This will give you the same layout on Kathleen:

#!/bin/bash

# Request 80 tasks with one cpu each:
#SBATCH --ntasks=80
#SBATCH --cpus-per-task=1
#SBATCH --time=00:10:00

You can launch parallel tasks by using srun inside the jobscript you are running with sbatch. For most programs, this replaces the mpirun or our previous gerun command.

#!/bin/bash

# Request 80 tasks:
#SBATCH --ntasks=80
#SBATCH --time=00:10:00

srun myprog

Practical examples of how to run parallel tasks can be found in Slurm Example Jobscripts.

Slurm directives must be at the top§

Slurm requires all the #SBATCH directives to be together at the top of the file. Once there are any commands being run, it will ignore any further #SBATCH lines. This is different from SGE which would parse those lines from anywhere in the file.

Passing in command-line arguments§

You can also pass options directly to the sbatch command and this will override the settings in your script. This can be useful if you are scripting your job submissions in more complicated ways.

For example, if you want to change the name of the job for this one instance of the job you can submit your script with:

sbatch --job-name=NewName myscript.sh

Or if you want to alter the wallclock time in the existing script to 24 hours:

sbatch -t 0-24:00:00 myscript.sh

You can also use srun to submit your jobscript. srun only accepts command-line arguments.

srun --ntasks=80 --time=00:10:00 myjobscript

You can submit jobs with dependencies by using the --depend option. For example, the command below submits a job that won't run until job 12345 has finished:

sbatch --depend=12345 myscript.sh

Note that for future reference, it helps if you have these options inside your jobscript rather than passed in on the command line whenever possible, so there is one place to check what your past jobs were requesting.

Specifics of using Slurm on Young§

Partitions§

The correct partition will be assigned based on the resources you ask for. However, to use High Bandwith Memory (HBM) nodes you will need to explicitly specify:

--partition=hbm

This will be covered in more detail in the example scripts linked to below.

Interactive Sessions§

In addition to requesting job resources through a submission script, you can also run in an interactive manner similar to how you would run something on your PC or a login node. Interactive sessions are best when you are in active development or need to quickly iterate a series of jobs.

There are two methods for starting an interactive session. srun will migrate your session to a compute node immediately whereas salloc will allocate resources and start a new session but leave you on the login node. salloc is generally best for multi-node MPI jobs whereas srun should be used for pretty much all other single-node interactive jobs.

All commands and resource requests are passed in via the command line, no Slurm script is used.

This example would start an interactive session with 32G RAM and 8 CPUs on one node.

srun --mem-per-task=4G --nodes=1 --ntasks-per-node=8 --pty bash -l

This next example would start a session that requests 64 CPUs across 4 nodes, with 32G RAM reserved per node. Since it uses salloc the session will start on the login node and not migrate the session to a compute node. From here you could use srun or mpirun to execute an application in parallel.

salloc --mem-per-task=2G --nodes=4 --ntasks-per-node=16

This final example would start an interactive session on one GPU node, with one GPU, 8G RAM, and 1 CPU.

srun --gres=gpu:1 --mem-per-task=8G --nodes=1 --ntasks-per-node=1 --pty bash -l

For more detailed examples, please refer to Slurm Example Jobscripts

Temporary Storage§

If you're running on a GPU node you can request temporary storage on the node's local disk. By default, a temporary directory is created and the location stored in the TMPDIR environmental variable. If you don't explicitly request more you're automatically given 100MiB.

If you're running on a High-Bandwidth-Memory (HBM) node, then you automatically are allocated all the available local disk in the job and do not need to ask for it.

To request more local temp space on a GPU node, up to a maximum of 5900GiB, you use the --gres=tmpfs:<amount>. In this example we request 80GiB.

--gres=tmpfs:80G

If you're requesting both a GPU and temp space, you separate the gres request with a comma (no space).

--gres=gpu:1,tmpfs:80G

Checking your previous jobscripts§

If you want to check what you submitted for a specific job ID, you can still do this with the scriptfor utility. (This now runs the sacct command for you with relevant options).

scriptfor 12345

As mentioned above, this will not show any command line options you passed in.

Checking your whole submit-time environment§

By default, jobs will copy your current environment that you have on the login node you are submitting from (--export=ALL from the command comparison table). This is different from our previous setup. You can view the whole environment that your job was submitted with using the envfor utility. (This also runs sacct with relevant options).

envfor 12345

This output can be long and contain multi-line shell functions.

How do I monitor a job?§

squeue§

The squeue --me command shows the status of your jobs. By default, if you run it with no options, it shows all jobs. This makes it easier to keep track of your jobs.

The output will look something like this:

  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
     22  kathleen lammps_b  ccxxxxx  R       0:04      2 node-c11b-[002-003]

This shows you the job ID, the partition it is using, the first 8 characters from the name you have given the job, your username, the state the job is in, how long it has been waiting (or running, if it has begun), the number of nodes it requested, and what nodes it is running on.

You can get a little more information with the -l or --long option:

Tue Jun 17 12:15:57 2025
  JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
     22  kathleen lammps_b  ccxxxxx  RUNNING       1:16   2:00:00      2 node-c11b-[002-003]

squeue --help will show how you can format the output differently.

scontrol§

This is a utility to display detailed information about the specified active job.

scontrol show job 123454

scancel§

You use scancel to delete a job from the queue.

scancel 123454

You can delete all your jobs at once (this acts on your own user):

scancel '*'

More scheduler commands§

Have a look at man squeue and note the commands shown in the SEE ALSO section of the manual page. Exit the manual page and then look at the man pages for those. (You will not be able to run all commands).

Past jobs§

Once a job ends, it no longer shows up in squeue. To see information about your finished jobs - when they started, when they ended, what node they ran on - use the command sacct.

# show my jobs from the last two days
sacct -X -o "jobid,user,jobname,start,end,alloccpus,nodelist,state,exitcode" -S now-2days
JobID             User    JobName               Start                 End  AllocCPUS        NodeList      State ExitCode 
------------ --------- ---------- ------------------- ------------------- ---------- --------------- ---------- -------- 
19             ccxxxxx lammps_bt+ 2025-06-16T14:13:39 2025-06-16T14:13:44        160 node-c11b-[002+     FAILED      2:0 
22             ccxxxxx lammps_bt+ 2025-06-17T12:14:41             Unknown        160 node-c11b-[002+    RUNNING      0:0

If a job ended and didn't create the files you expect, check the start and end times to see whether it ran out of wallclock time.

If a job only ran for seconds and didn't produce the expected output, there was probably something wrong in your script - check the .out.txt and .err.txt files in the directory you submitted the job from for errors.