Slurm limits¶
The main resource for understanding limits in a Slurm context is the “Resource limits” page from the documentation.
Limits can be set at mutliple levels. They can be global, applying to all the cluster and all the users, or it can be specific to a partition or an account. Also, it can be specific to a quality of service (QOS). The following sections explains them in detail.
Global limits¶
Limits can be imposed globally, at the cluster level, for all users, ...
dfr@nic5-login1 ~ $ sacctmgr show cluster format=name,GrpJobs,GrpSubmit,GrpTRES,GrpTRESMins,GrpTRESRunMins,GrpWall,MaxJobs,MaxSubmit,MaxTRESMins,MaxTRES,MaxWall
Name GrpJobs GrpSubmit GrpTRES GrpTRESMins GrpTRESRunMin GrpWall MaxJobs MaxSubmit MaxTRESMins MaxTRES MaxWall
---------- ------- --------- ------------- ------------- ------------- ----------- ------- --------- ------------- ------------- -----------
- GrpJobs: Maximum number of running jobs in aggregate for this cluster
- GrpSubmit: Maximum number of jobs which can be in a pending or running state at any time in aggregate
- GrpTRES: Maximum number of TRES (“trackable resource”) running jobs are able to be allocated in aggregate
- GrpTRESMins: The total number of TRES minutes that can possibly be used by past, present and future jobs running on the cluster
- GrpTRESRunMin: Used to limit the combined total number of TRES minutes used by all jobs running on the cluster
- GrpWall: Maximum wall clock time running jobs are able to be allocated in aggregate for this cluster
- MaxJobs: Maximum number of jobs each user is allowed to run at one time
- MaxSubmit: Maximum number of jobs the cluster can have in a pending or running state at any time
- MaxTRESMins: Maximum number of TRES minutes each job is able to use in this cluster
- MaxTRES: Maximum number of TRES each job is able to use
- MaxWall: Maximum wall clock time each job is able to use
... or user per user.
dfr@nic5-login1 ~ $ sacctmgr list user dfr withassoc format=user,maxjobs,maxnodes,maxTRES,maxSubmit,maxwall,maxTRESmins
User MaxJobs MaxNodes MaxTRES MaxSubmit MaxWall MaxTRESmins
---------- ------- -------- -------- --------- ----------- ------------
dfr
- MaxJobs: Maximum number of jobs this user is allowed to run at one time
- MaxNodes: Maximum number of nodes per job
- MaxTRES: Maximum number of TRES per job
- MaxSubmit: Maximum number of jobs the user can have in a pending or running state at any time
- MaxWall: Maximum wall clock time each job of this user is able to use
- MaxCPUMins: Maximum number of CPU.minutes each job of this user is able to use
QOS limits¶
Quality of Services are the most versatile way of setting specific privileges, but also limits, to users and jobs. The following command lists them, along with their associated limits.
dfr@nic5-login1 ~ $ sacctmgr list qos format=name,GrpTRES,GrpTRESMins,GrpTRESRunMin,GrpJobs,GrpSubmit,GrpWall,MaxTRES,MaxTRESPerNode,MaxTRESMins,MaxWall,MaxTRESPU,MaxJobsPU,MaxSubmitPU,MaxTRESPA,MaxJobsPA,MaxSubmitPA,MinTRES
Name GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MaxTRESPA MaxJobsPA MaxSubmitPA MinTRES
---------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- -------------
normal cpu=320 cpu=384 384 640
- GrpTRES: Maximum number of TRES running jobs are able to be allocated in aggregate for this QOS
- GrpTRESMins: The total number of TRES minutes that can possibly be used by past, present and future jobs running from this QOS
- GrpTRESRunMin: The total number of TRES minutes that can be used by all jobs running with this QOS
- GrpJobs: Maximum number of running jobs in aggregate for this QOS
- GrpSubmit: Maximum number of jobs which can be in a pending or running state at any time in aggregate with this QOS
- GrpWall: Maximum wall clock time running jobs are able to be allocated in aggregate for this QOS
- MaxTRES: Maximum number of TRES minutes each job is able to use in this QOS
- MaxTRESPerNode: Maximum number of TRES each node in a job allocation can use.
- MaxTRESMins: Maximum number of TRES minutes each job is able to use in this QOS
- MaxWall: Maximum wall clock time each job is able to use in this QOS
- MaxTRESPU: Maximum number of TRES each user is able to use
- MaxJobsPU: Maximum number of jobs each user is allowed to run at one time
- MaxSubmitPU: Maximum number of jobs pending or running state at any time per user.
- MaxTRESPA: Maximum number of TRES each account is able to use
- MaxJobsPA: Maximum number of jobs each account is allowed to run at one time.
- MaxSubmitPA: A Maximum number of jobs pending or running state at any time per account
- MinTRES: Minimum number of TRES each job running under this QOS must request. Otherwise the job will pend until modified.
In the example, a limit is set at 384 CPUs maximum per user in total, along with a 320 CPUs limit for each job, a maximum of 384 running jobs per user, and 640 maximum running or pending jobs per user.
Users can specifiy a chosen QOS for each job, otherwise the default cluster QOS (normal) applies unless a default user QOS...
dfr@nic5-login1 ~ $ sacctmgr list user dfr format=name,defaultqos
Name Def QOS
---------- ---------
... or default account QOS is set.
dfr@nic5-login1 ~ $ sacctmgr list account ceci format=name,defaultqos
Name Def QOS
---------- ---------
The above two example show no default QOS set at the user or account level.
Account limits¶
Accounts can have limits set too.
dfr@nic5-login1 ~ $ sacctmgr list account ceci withassoc where user=dfr format=account,GrpJobs,GrpNodes,GrpTRES,GrpMem,GrpSubmit,GrpWall,GrpTRESMins,MaxJobs,MaxNodes,MaxTRES,MaxSubmit,MaxWall,MaxTRESMin
Account GrpJobs GrpNodes GrpTRES GrpMem GrpSubmit GrpWall GrpTRESMins MaxJobs MaxNodes MaxTRES MaxSubmit MaxWall MaxTRESMins
---------- ------- -------- ------------- ------- --------- ----------- ------------- ------- -------- ------------- --------- ----------- -------------
ceci
- GrpJobs: Maximum number of running jobs in aggregate for this account and all accounts which are children of this account
- GrpNodes: Maximum number of nodes running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
- GrpTRES: Maximum number of TRES running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
- GrpMem: Maximum amount of memory running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
- GrpSubmit: Maximum number of jobs which can be in a pending or running state at any time in aggregate for this account and all accounts which are children of this account.
- GrpWall: Maximum wall clock time running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
- GrpTRESMins: Maximum number of TRES minutes running jobs are able to be allocated in aggregate for this account and all accounts which are children of this account.
- MaxJobs: Maximum number of jobs this account is allowed to run at one time. This is overridden if set directly on a user. Default is the cluster’s limit.
- MaxNodes: Maximum number of nodes per job the children of this account can run
- MaxTRES: Maximum number of TRES each job is able to use in this account
- MaxSubmit: Maximum number of jobs which this account can have in a pending or running state at any time.
- MaxWall: Maximum wall clock time each job is able to use in this account
- MaxTRESMin: Maximum number of TRES minutes each job is able to use in this account
Every user has a default account, but can also possibly choose from a set of accounts they have access to, for each job.
Partition limits¶
Each partition in a cluster has a set of limits specific to partitions, for instance here with a 2 day maximum wall time on the first two partitions and a quite larger walltime on the last one ...
dfr@nic5-login1 ~ $ scontrol show partitions | egrep -ie "^P|Max"
PartitionName=batch
MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=hmem
MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=bio
MaxNodes=UNLIMITED MaxTime=62-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
- MaxTime: Maximum run time limit for jobs.
- MaxCPUsPerNode: Maximum number of CPUs on any node available to all jobs from this partition.
- MaxMemPerCPU: Maximum real memory size available per allocated CPU
- MaxMemPerNode: Maximum real memory size available per allocated node
... and can be associated with a QOS, from which it will inherit all limits:
dfr@nic5-login1 ~ $ scontrol show partitions | egrep -ie "^P|Qos"
PartitionName=batch
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
PartitionName=hmem
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
PartitionName=bio
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
dfr@nic5-login1 ~ $
In the above example, QoS=N/A
shows there is no PartitionQOS set on any partition.
A summary for all limits related to a user and their account and the QOSes can be obtained with scontrol show assoc_mgr
.
Precedence¶
The order in which limits are enforced is
- Partition QOS limit
- Job QOS limit
- User association
- Account association(s), ascending the hierarchy
- Root/Cluster association
- Partition limit
by default. If the configuration flag OverPartQOS
is set, the order for job QOS and partition QOS is reversed.