Slurm job scheduler

Note

This page should be considered an introduction to the Slurm job scheduler as Slurm has capabilities far beyond what we describe on this page. We would encourage everyone to look through the Slurm documentation for more in depth information. The Slurm Manual pages can be especially good for a quick reference.

What is Slurm?

Slurm is a job scheduling system for small and large clusters. As a cluster workload manager, Slurm has three key functions:

  1. Lets a user request a resources on a compute node to run their workloads

  2. Provides a framework (commands) to start, cancel, and monitor a job

  3. Keeps track of all jobs to ensure everyone can efficiently use all computing resources without stepping on each others toes.

When a user submits a job Slurm will decide when to allow the job to run on a compute node. This is very important for shared machines such as the Viking cluster so that the resources are shared fairly between users so one person’s jobs do not dominate.

Tip

The Slurm documentation has an in depth overview with a diagram to help picture Slurm.

Resource allocation

In order to interact with Slurm, the user must first give some indication of the resources they require. At a minimum these include:

  • How long does the job need to run for

  • On how many processors to run the job

The default resource allocation for jobs can be found on the resource partitions page.

Armed with this information, the scheduler is able to dispatch the jobs at some point in the future when the resources become available. A fair-share policy is in operation to guide the scheduler towards allocating resources fairly between users (more on this later).

Running some Slurm commands on Viking

To interact with Slurm there are a number of command you can use. This table summarises the most common commands that can be used on Viking:

Command

Description

squeue

Reports the state of jobs (with filtering, sorting, and formatting options). By default
it reports the running jobs in priority order followed by the pending jobs in priority order

srun

Used to submit a job for execution in real time

salloc

Allocate resources for a job in real time (typically used to allocate resources and
spawn a shell, in which the srun command is used to launch parallel tasks)

sbatch

Submit a job script for later execution (the script typically contains one or more
srun commands to launch parallel tasks)

sattach

Attach standard input, output, and error to a currently running job , or job step

scancel

Cancel a pending or running job

sinfo

Reports the state of partitions and nodes managed by Slurm (it has a variety of filtering,
sorting, and formatting options)

sacct

Report job accounting information about active or completed jobs

squeue

The squeue command will be a command you use often. To run the command first login to Viking.

$ squeue

You should see a list of jobs. Each column describes the status of each job:

Column

Description

JOBID

A number used to uniquely identify your job within Slurm

PARTITION

The partition the job has been submitted to

NAME

The job’s name

USER

The username of the job’s owner

ST

Current job status: R (running), PD (pending - queued and waiting)

TIME

The time the job has been running

NODES

The number of nodes used by the job

NODELIST (REASON)

The nodes used by the job or reason the job is not running

To see only your jobs in the queue, run the following command

alternatively, replace $USER with any username
$ squeue -u $USER

To provide information on the job you have queued or are running:

replace JOBID with a value job ID
$ squeue -j JOBID

Other useful options you can use with squeue are summarised here:

-a

Display all jobs

-l

Display more information

-u

Only display users jobs

-p

Only display jobs in a particular partition

–usage

Print help

-v

Verbose listing

For a comprehensive look at squeue refer to the manual page or run man squeue on Viking.

sinfo

The sinfo command displays node and partition (queue) information and state.

Column

Description

PARTITION

Asterisk after a partition name indicates the default partition

AVAIL

Partition is able to accept jobs

TIMELIMIT

Maximum time a job can run for

NODES

Number of available nodes in the partition

STATE

Down - not available, alloc - jobs being run, idle - waiting for jobs

NODELIST

Nodes available in the partition

For a comprehensive look at sinfo refer to the manual page or run man sinfo on Viking.

sacct

To display a list of recently completed jobs use the sacct command.

in this case looking at job ID: 147874
$ sacct -j 147874
JobID        JobName    Partition  Account    AllocCPUS  State      ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
147874       simple.job nodes      dept-proj+ 1          COMPLETED  0:0
147874.batch batch                 dept-proj+ 1          COMPLETED  0:0

Important switches to sacct are:

Option

Action

-a

Display all users jobs

-b

Display a brief listing

-E

Select the jobs end date/time

-h

Print help

-j

Display a specific job

-l

Display long format

–name

Display jobs with name

-S

Select the jobs start date/time

-u

Display only this user

-v

Verbose listing

For a comprehensive look at sacct refer to the manual page or run man sacct on Viking.

Note

There are many more commands you can use to query Slurm. Please see the Slurm documentation for further details.

Fair-share

Slurm uses a system called ‘Fair-Share’ to ensure that all users have equitable access to the HPC resources, allowing infrequent users the same opportunity for their jobs to run as users who use large volumes of computational time. This is achieved by assigning each user a penalty score based on how much they have used Viking, this is termed ‘Fair-Share’. It is a fractional score in the range [0,1] with 1 denoting an account that has not run any jobs and decreases with the amount of computational time used. Fair-share decays exponentially to allow it to reset if no further jobs are run.

You can view your fair-share, and compare it to that of other Viking users by running sshare -a -l and looking at the ‘FairShare’ column. ‘LevelFS’ is the same underlying quantity but normalized differently. You can read more about the Fair-Share algorithm in the Slurm documentation.

However, note that while your fair-share is used by the scheduler, it is only one of several factors which determine the overall priority of a given job. For example, large jobs will usually take longer to be scheduled, whilst smaller jobs can be “backfilled” (fit into the gaps between scheduled large jobs). If you find your jobs are not getting scheduled quickly, it will almost always be more effective to refactor your jobs to use less resources, than to wait for your Fair-Share to recover.