A job submission helper on Hercules: ssubmit
¶
Warning
The ssubmit
tool is only available on Hercules2 cluster
Overview¶
For users who are not yet very experienced in using the SLURM batch system, we
provide a tool, ssubmit
, that simplifies job submission. It generates the
sbatch script and submits the job.
Basic Usage¶
A batch job can be submitted using the ssubmit
command
ssubmit program arg1 arg2 ...
where:
program
is the application you want to runarg1 arg2 ...
are the program arguments (if any)
If your programs is from a global installed application, you need to load the
module before use ssubmit
.
module load application
ssubmit program arg1 arg2 ...
Then, ssubmit
will ask you for the resources reserved for your job in terms
of time, memory and number of cores. ssubmit
creates a submission script
and submits your job. The submission script is stored in a slurm-JOB_ID.sh
file and the job output in a slurm-JOB_ID.out
file, where JOB_ID
is the
job ID set by slurm
The following example will submit a job that executes the echo
command with
'hello from ...'
as argument on a compute node using 1 core with 1 Gb of
memory during 1 hour.
ssubmit echo ‘hello from job id $SLURM_JOB_ID running on compute node $(hostname)’
====== Time ====================================================================
[1] 1 hour (default)
[2] 1 day
[3] 5 days
[4] 15 days (max)
Your choice or your time (DD-HH:MM or HH:MM): 1
Using serial paradigm 1 process
====== Memory ==================================================================
Memory per process in GB (defaul 1GB max 2000GB): 1
====== Summary =================================================================
You are about to submit a job to the cluster.
Please check if everything is correct.
Job name: echo
Executable: echo
Arguments: ‘hello from job id running on compute node cecicluster’
Number of processors: 1
Memory per processor: 1024
Time: 00-01:00:00
Partition: batch
Would you like to continue [Y/n] ?y
====== Submitted job informations ==============================================
Job id: 201403967
Job output: slurm-201403967.out
Get job status with: squeue -j 201403967
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
201403967 batch echo ceciuser PD 0:00 1 (None)
In this case, the job output is located in slurm-201403967.out
and the
submission script is stored in slurm-201403967.sh
Ssubmit arguments¶
You can set up some job parameters using ssubmit’s arguments. The arguments must been set before program name. Those are:
-y
: Automatic confirm of submission values-t --time
: TIME: allocation time DD-HH:MM-m --mem-per-cpu
: MEMORY: Memory per process in MB-n --ntasks
: NTASKS: Number of process-J --job-name
: NAME: Specify a name for the job allocation-p , --template-path
: PATH: Specify a template files path
For example:
ssubmit --time DD-HH:MM --mem-per-cpu MB --ntasks N --job-name NAME program arg1 arg2 ...
The -y
option allow you to skip the confirmation step
ssubmit --time 00-01:00 --mem-per-cpu 1024 -y echo 'hello world'
Advanced usage¶
You can modify the way ssubmit
generates the submission script adding your own
template files in a folder determined with the -p
argument.
In the templates directory, you can add three kind of files that will overwrite the default ones.
template.js¶
This is a jinja2 file with
the content that ssubmit
will write after the #SBATCH header in the
submission script. In the jinja2 template you can access this
variables:
path
: the directory path where your application is locatedcmd
: the program nameargv
: an array with the command and the arguments.sbatch
: a dict with the sbatch parametersinc
: a dict with user defined variables (see inc.py module)paradigm
: the selected paradigm.config
: a dict with the data you set inconfig.yaml
file
The default template is
{# minimal template #}
{{ path }}/{{ cmd }} {% for i in argv[2:] %}{{ i }} {% endfor %}
An example using inc and sbatch variables:
echo "inc variables example p1={{ inc.p1 }}, p2={{ inc.p2 }}"
hostname
mpirun -np {{ sbatch['ntasks'] }} {{ cmd }}
config.yaml¶
By default, the job paradigm is set to serial
so your job will only use one core.
For preinstalled applications supported paradigm are predefined.
If you want to overwrite the predefined paradigms or set the paradigms for
applications you have installed in you home directory, you need a yaml file config.yaml
that contains the
paradigms your application is able to use.
For example, the config.yaml
file for an application that support all allowed
paradigms is
---
# List of paradigm this program supports
paradigms:
- serial
- smp
- mpi
inc.py¶
inc.py
is a python file or module that allows you to implement your own
arguments checks and add new variables to the generated submission script. The
functions you can define in this file are:
function | description |
---|---|
def check_args(argv) | Add the checks and modifications you need to your program arguments.
|
def inc(argv) | Include new variables needed on your application.
|
check_sbatch(sbatch) | Used to check the sbatch parameters. Must return True or False depending if there is an error or not in the parameters
|
def get_sbatch_options(argv): | Append new sbatch parameters.
|
An example of inc.py could be:
import os
import sys
def check_args(argv):
""" check specific arguments related to each application
and do modifications on it if needed
# argv[0] = ssubmit
# argv[1] = application executable
# argv[2...] = application arguments
"""
# argument must be a input file
if not os.path.isfile(argv[2]):
print("ERROR: Input file %s doesn't exist or is not readable" % (argv[2]))
sys.exit(1)
return argv
def inc(argv):
""" do any action related to the application and return variables as dict
or list to be used in the sbatch template
"""
return {'app_version': "20210630-R1"}
def check_sbatch(sbatch):
""" Check sbatch """
if sbatch['paradigm'] == 'mpi':
if int(sbatch['ntasks']) % 2 == 0:
return True
else:
print('Number of mpi process must be EVEN number')
return False
def get_sbatch_options(argv):
""" set more sbatch options if application needs it return a dict with
{ option: value, ... }
"""
# Set log file name as input file name
sbatch = {'output': argv[2].rsplit('.', 1)[0] + '.log'}
return sbatch