Slurm limits ¶

The main resource for understanding limits in a Slurm context is the “Resource limits” page from the documentation.

Limits can be set at mutliple levels. They can be global, applying to all the cluster and all the users, or it can be specific to a partition or an account. Also, it can be specific to a quality of service (QOS). The following sections explains them in detail.

Contents

Slurm limits

Global limits ¶

Limits can be imposed globally, at the cluster level, for all users, ...

dfr@nic5-login1 ~ $ sacctmgr show cluster format=name,GrpJobs,GrpSubmit,GrpTRES,GrpTRESMins,GrpTRESRunMins,GrpWall,MaxJobs,MaxSubmit,MaxTRESMins,MaxTRES,MaxWall
       Name GrpJobs GrpSubmit       GrpTRES   GrpTRESMins GrpTRESRunMin     GrpWall MaxJobs MaxSubmit   MaxTRESMins       MaxTRES     MaxWall
 ---------- ------- --------- ------------- ------------- ------------- ----------- ------- --------- ------------- ------------- -----------

GrpJobs: Maximum number of running jobs in aggregate for this cluster
GrpSubmit: Maximum number of jobs which can be in a pending or running state at any time in aggregate
GrpTRES: Maximum number of TRES (“trackable resource”) running jobs are able to be allocated in aggregate
GrpTRESMins: The total number of TRES minutes that can possibly be used by past, present and future jobs running on the cluster
GrpTRESRunMin: Used to limit the combined total number of TRES minutes used by all jobs running on the cluster
GrpWall: Maximum wall clock time running jobs are able to be allocated in aggregate for this cluster
MaxJobs: Maximum number of jobs each user is allowed to run at one time
MaxSubmit: Maximum number of jobs the cluster can have in a pending or running state at any time
MaxTRESMins: Maximum number of TRES minutes each job is able to use in this cluster
MaxTRES: Maximum number of TRES each job is able to use
MaxWall: Maximum wall clock time each job is able to use

... or user per user.

dfr@nic5-login1 ~ $ sacctmgr list user dfr withassoc format=user,maxjobs,maxnodes,maxTRES,maxSubmit,maxwall,maxTRESmins
      User MaxJobs MaxNodes  MaxTRES MaxSubmit     MaxWall  MaxTRESmins
---------- ------- -------- -------- --------- ----------- ------------
       dfr

MaxJobs: Maximum number of jobs this user is allowed to run at one time
MaxNodes: Maximum number of nodes per job
MaxTRES: Maximum number of TRES per job
MaxSubmit: Maximum number of jobs the user can have in a pending or running state at any time
MaxWall: Maximum wall clock time each job of this user is able to use
MaxCPUMins: Maximum number of CPU.minutes each job of this user is able to use

QOS limits ¶

Quality of Services are the most versatile way of setting specific privileges, but also limits, to users and jobs. The following command lists them, along with their associated limits.

 dfr@nic5-login1 ~ $ sacctmgr list qos  format=name,GrpTRES,GrpTRESMins,GrpTRESRunMin,GrpJobs,GrpSubmit,GrpWall,MaxTRES,MaxTRESPerNode,MaxTRESMins,MaxWall,MaxTRESPU,MaxJobsPU,MaxSubmitPU,MaxTRESPA,MaxJobsPA,MaxSubmitPA,MinTRES
      Name       GrpTRES   GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit     GrpWall       MaxTRES MaxTRESPerNode   MaxTRESMins     MaxWall     MaxTRESPU         MaxJobsPU MaxSubmitPU     MaxTRESPA MaxJobsPA MaxSubmitPA       MinTRES
---------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- -------------
    normal                                                                               cpu=320                                                cpu=384         384         640

GrpTRES: Maximum number of TRES running jobs are able to be allocated in aggregate for this QOS
GrpTRESMins: The total number of TRES minutes that can possibly be used by past, present and future jobs running from this QOS
GrpTRESRunMin: The total number of TRES minutes that can be used by all jobs running with this QOS
GrpJobs: Maximum number of running jobs in aggregate for this QOS
GrpSubmit: Maximum number of jobs which can be in a pending or running state at any time in aggregate with this QOS
GrpWall: Maximum wall clock time running jobs are able to be allocated in aggregate for this QOS
MaxTRES: Maximum number of TRES minutes each job is able to use in this QOS
MaxTRESPerNode: Maximum number of TRES each node in a job allocation can use.
MaxTRESMins: Maximum number of TRES minutes each job is able to use in this QOS
MaxWall: Maximum wall clock time each job is able to use in this QOS
MaxTRESPU: Maximum number of TRES each user is able to use
MaxJobsPU: Maximum number of jobs each user is allowed to run at one time
MaxSubmitPU: Maximum number of jobs pending or running state at any time per user.
MaxTRESPA: Maximum number of TRES each account is able to use
MaxJobsPA: Maximum number of jobs each account is allowed to run at one time.
MaxSubmitPA: A Maximum number of jobs pending or running state at any time per account
MinTRES: Minimum number of TRES each job running under this QOS must request. Otherwise the job will pend until modified.

In the example, a limit is set at 384 CPUs maximum per user in total, along with a 320 CPUs limit for each job, a maximum of 384 running jobs per user, and 640 maximum running or pending jobs per user.

Users can specifiy a chosen QOS for each job, otherwise the default cluster QOS (normal) applies unless a default user QOS...

dfr@nic5-login1 ~ $ sacctmgr list user dfr format=name,defaultqos
      Name   Def QOS
---------- ---------

... or default account QOS is set.

dfr@nic5-login1 ~ $ sacctmgr list account ceci format=name,defaultqos
         Name   Def QOS
   ---------- ---------

The above two example show no default QOS set at the user or account level.

Account limits ¶

Accounts can have limits set too.

dfr@nic5-login1 ~ $ sacctmgr list account ceci withassoc where user=dfr format=account,GrpJobs,GrpNodes,GrpTRES,GrpMem,GrpSubmit,GrpWall,GrpTRESMins,MaxJobs,MaxNodes,MaxTRES,MaxSubmit,MaxWall,MaxTRESMin
    Account GrpJobs GrpNodes       GrpTRES  GrpMem GrpSubmit     GrpWall   GrpTRESMins MaxJobs MaxNodes       MaxTRES MaxSubmit     MaxWall   MaxTRESMins
 ---------- ------- -------- ------------- ------- --------- ----------- ------------- ------- -------- ------------- --------- ----------- -------------
       ceci

GrpJobs: Maximum number of running jobs in aggregate for this account and all accounts which are children of this account
GrpNodes: Maximum number of nodes running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
GrpTRES: Maximum number of TRES running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
GrpMem: Maximum amount of memory running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
GrpSubmit: Maximum number of jobs which can be in a pending or running state at any time in aggregate for this account and all accounts which are children of this account.
GrpWall: Maximum wall clock time running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
GrpTRESMins: Maximum number of TRES minutes running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
MaxJobs: Maximum number of jobs this account is allowed to run at one time. This is overridden if set directly on a user. Default is the cluster’s limit.
MaxNodes: Maximum number of nodes per job the children of this account can run
MaxTRES: Maximum number of TRES each job is able to use in this account
MaxSubmit: Maximum number of jobs which this account can have in a pending or running state at any time.
MaxWall: Maximum wall clock time each job is able to use in this account
MaxTRESMin: Maximum number of TRES minutes each job is able to use in this account

Every user has a default account, but can also possibly choose from a set of accounts they have access to, for each job.

Partition limits ¶

Each partition in a cluster has a set of limits specific to partitions, for instance here with a 2 day maximum wall time on the first two partitions and a quite larger walltime on the last one ...

dfr@nic5-login1 ~ $ scontrol show partitions  | egrep -ie "^P|Max"
PartitionName=batch
   MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=hmem
   MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=bio
   MaxNodes=UNLIMITED MaxTime=62-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

MaxTime: Maximum run time limit for jobs.
MaxCPUsPerNode: Maximum number of CPUs on any node available to all jobs from this partition.
MaxMemPerCPU: Maximum real memory size available per allocated CPU
MaxMemPerNode: Maximum real memory size available per allocated node

... and can be associated with a QOS, from which it will inherit all limits:

dfr@nic5-login1 ~ $ scontrol show partitions  | egrep -ie "^P|Qos"
PartitionName=batch
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=YES QoS=N/A
PartitionName=hmem
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
PartitionName=bio
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
dfr@nic5-login1 ~ $

In the above example, QoS=N/A shows there is no PartitionQOS set on any partition.

A summary for all limits related to a user and their account and the QOSes can be obtained with scontrol show assoc_mgr.

Precedence ¶

The order in which limits are enforced is

Partition QOS limit
Job QOS limit
User association
Account association(s), ascending the hierarchy
Root/Cluster association
Partition limit

by default. If the configuration flag OverPartQOS is set, the order for job QOS and partition QOS is reversed.

Slurm limits¶

Global limits¶

QOS limits¶

Account limits¶

Partition limits¶

Precedence¶