Slurm limits

The main resource for understanding limits in a Slurm context is the “Resource limits” page from the documentation.

Limits can be set at mutliple levels. They can be global, applying to all the cluster and all the users, or it can be specific to a partition or an account. Also, it can be specific to a quality of service (QOS). The following sections explains them in detail.

Global limits

Limits can be imposed globally, at the cluster level, for all users, ...

dfr@nic5-login1 ~ $ sacctmgr show cluster format=name,GrpJobs,GrpSubmit,GrpTRES,GrpTRESMins,GrpTRESRunMins,GrpWall,MaxJobs,MaxSubmit,MaxTRESMins,MaxTRES,MaxWall
       Name GrpJobs GrpSubmit       GrpTRES   GrpTRESMins GrpTRESRunMin     GrpWall MaxJobs MaxSubmit   MaxTRESMins       MaxTRES     MaxWall
 ---------- ------- --------- ------------- ------------- ------------- ----------- ------- --------- ------------- ------------- -----------
  • GrpJobs: Maximum number of running jobs in aggregate for this cluster
  • GrpSubmit: Maximum number of jobs which can be in a pending or running state at any time in aggregate
  • GrpTRES: Maximum number of TRES (“trackable resource”) running jobs are able to be allocated in aggregate
  • GrpTRESMins: The total number of TRES minutes that can possibly be used by past, present and future jobs running on the cluster
  • GrpTRESRunMin: Used to limit the combined total number of TRES minutes used by all jobs running on the cluster
  • GrpWall: Maximum wall clock time running jobs are able to be allocated in aggregate for this cluster
  • MaxJobs: Maximum number of jobs each user is allowed to run at one time
  • MaxSubmit: Maximum number of jobs the cluster can have in a pending or running state at any time
  • MaxTRESMins: Maximum number of TRES minutes each job is able to use in this cluster
  • MaxTRES: Maximum number of TRES each job is able to use
  • MaxWall: Maximum wall clock time each job is able to use

... or user per user.

dfr@nic5-login1 ~ $ sacctmgr list user dfr withassoc format=user,maxjobs,maxnodes,maxTRES,maxSubmit,maxwall,maxTRESmins
      User MaxJobs MaxNodes  MaxTRES MaxSubmit     MaxWall  MaxTRESmins
---------- ------- -------- -------- --------- ----------- ------------
       dfr
  • MaxJobs: Maximum number of jobs this user is allowed to run at one time
  • MaxNodes: Maximum number of nodes per job
  • MaxTRES: Maximum number of TRES per job
  • MaxSubmit: Maximum number of jobs the user can have in a pending or running state at any time
  • MaxWall: Maximum wall clock time each job of this user is able to use
  • MaxCPUMins: Maximum number of CPU.minutes each job of this user is able to use

QOS limits

Quality of Services are the most versatile way of setting specific privileges, but also limits, to users and jobs. The following command lists them, along with their associated limits.

 dfr@nic5-login1 ~ $ sacctmgr list qos  format=name,GrpTRES,GrpTRESMins,GrpTRESRunMin,GrpJobs,GrpSubmit,GrpWall,MaxTRES,MaxTRESPerNode,MaxTRESMins,MaxWall,MaxTRESPU,MaxJobsPU,MaxSubmitPU,MaxTRESPA,MaxJobsPA,MaxSubmitPA,MinTRES
      Name       GrpTRES   GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit     GrpWall       MaxTRES MaxTRESPerNode   MaxTRESMins     MaxWall     MaxTRESPU         MaxJobsPU MaxSubmitPU     MaxTRESPA MaxJobsPA MaxSubmitPA       MinTRES
---------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- -------------
    normal                                                                               cpu=320                                                cpu=384         384         640
  • GrpTRES: Maximum number of TRES running jobs are able to be allocated in aggregate for this QOS
  • GrpTRESMins: The total number of TRES minutes that can possibly be used by past, present and future jobs running from this QOS
  • GrpTRESRunMin: The total number of TRES minutes that can be used by all jobs running with this QOS
  • GrpJobs: Maximum number of running jobs in aggregate for this QOS
  • GrpSubmit: Maximum number of jobs which can be in a pending or running state at any time in aggregate with this QOS
  • GrpWall: Maximum wall clock time running jobs are able to be allocated in aggregate for this QOS
  • MaxTRES: Maximum number of TRES minutes each job is able to use in this QOS
  • MaxTRESPerNode: Maximum number of TRES each node in a job allocation can use.
  • MaxTRESMins: Maximum number of TRES minutes each job is able to use in this QOS
  • MaxWall: Maximum wall clock time each job is able to use in this QOS
  • MaxTRESPU: Maximum number of TRES each user is able to use
  • MaxJobsPU: Maximum number of jobs each user is allowed to run at one time
  • MaxSubmitPU: Maximum number of jobs pending or running state at any time per user.
  • MaxTRESPA: Maximum number of TRES each account is able to use
  • MaxJobsPA: Maximum number of jobs each account is allowed to run at one time.
  • MaxSubmitPA: A Maximum number of jobs pending or running state at any time per account
  • MinTRES: Minimum number of TRES each job running under this QOS must request. Otherwise the job will pend until modified.

In the example, a limit is set at 384 CPUs maximum per user in total, along with a 320 CPUs limit for each job, a maximum of 384 running jobs per user, and 640 maximum running or pending jobs per user.

Users can specifiy a chosen QOS for each job, otherwise the default cluster QOS (normal) applies unless a default user QOS...

dfr@nic5-login1 ~ $ sacctmgr list user dfr format=name,defaultqos
      Name   Def QOS
---------- ---------

... or default account QOS is set.

dfr@nic5-login1 ~ $ sacctmgr list account ceci format=name,defaultqos
         Name   Def QOS
   ---------- ---------

The above two example show no default QOS set at the user or account level.

Account limits

Accounts can have limits set too.

dfr@nic5-login1 ~ $ sacctmgr list account ceci withassoc where user=dfr format=account,GrpJobs,GrpNodes,GrpTRES,GrpMem,GrpSubmit,GrpWall,GrpTRESMins,MaxJobs,MaxNodes,MaxTRES,MaxSubmit,MaxWall,MaxTRESMin
    Account GrpJobs GrpNodes       GrpTRES  GrpMem GrpSubmit     GrpWall   GrpTRESMins MaxJobs MaxNodes       MaxTRES MaxSubmit     MaxWall   MaxTRESMins
 ---------- ------- -------- ------------- ------- --------- ----------- ------------- ------- -------- ------------- --------- ----------- -------------
       ceci
  • GrpJobs: Maximum number of running jobs in aggregate for this account and all accounts which are children of this account
  • GrpNodes: Maximum number of nodes running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
  • GrpTRES: Maximum number of TRES running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
  • GrpMem: Maximum amount of memory running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
  • GrpSubmit: Maximum number of jobs which can be in a pending or running state at any time in aggregate for this account and all accounts which are children of this account.
  • GrpWall: Maximum wall clock time running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
  • GrpTRESMins: Maximum number of TRES minutes running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
  • MaxJobs: Maximum number of jobs this account is allowed to run at one time. This is overridden if set directly on a user. Default is the cluster’s limit.
  • MaxNodes: Maximum number of nodes per job the children of this account can run
  • MaxTRES: Maximum number of TRES each job is able to use in this account
  • MaxSubmit: Maximum number of jobs which this account can have in a pending or running state at any time.
  • MaxWall: Maximum wall clock time each job is able to use in this account
  • MaxTRESMin: Maximum number of TRES minutes each job is able to use in this account

Every user has a default account, but can also possibly choose from a set of accounts they have access to, for each job.

Partition limits

Each partition in a cluster has a set of limits specific to partitions, for instance here with a 2 day maximum wall time on the first two partitions and a quite larger walltime on the last one ...

dfr@nic5-login1 ~ $ scontrol show partitions  | egrep -ie "^P|Max"
PartitionName=batch
   MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=hmem
   MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=bio
   MaxNodes=UNLIMITED MaxTime=62-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
  • MaxTime: Maximum run time limit for jobs.
  • MaxCPUsPerNode: Maximum number of CPUs on any node available to all jobs from this partition.
  • MaxMemPerCPU: Maximum real memory size available per allocated CPU
  • MaxMemPerNode: Maximum real memory size available per allocated node

... and can be associated with a QOS, from which it will inherit all limits:

dfr@nic5-login1 ~ $ scontrol show partitions  | egrep -ie "^P|Qos"
PartitionName=batch
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=YES QoS=N/A
PartitionName=hmem
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
PartitionName=bio
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
dfr@nic5-login1 ~ $

In the above example, QoS=N/A shows there is no PartitionQOS set on any partition.

A summary for all limits related to a user and their account and the QOSes can be obtained with scontrol show assoc_mgr.

Precedence

The order in which limits are enforced is

  1. Partition QOS limit
  2. Job QOS limit
  3. User association
  4. Account association(s), ascending the hierarchy
  5. Root/Cluster association
  6. Partition limit

by default. If the configuration flag OverPartQOS is set, the order for job QOS and partition QOS is reversed.