Problem-generating workflows¶

This page lists a collection of habits that lead to a degradation of the user experience for everyone. Please make sure you avoid them when you work on the clusters.

Running anything CPU or memory-intensive on the login node will make the login node slow for everyone, or even make it inaccessible in some edge cases. The login node is meant for editing files, submitting jobs, compiling (small) programs and performing file organisation activities. Anything else should be submitted as a job.

Issuing many requests per unit of time to the scheduler for instance by running watch squeue or using a script to submit thousands of small jobs. Every time the scheduler must answer a query like that, it must pause, or sometimes restart, the computations of the scheduling, ultimately delaying the startup of the jobs in the queue.

Submitting large amounts of jobs (or a big job array) without testing first with two or three jobs will result, in the event of an issue with the jobs, in a large number of jobs being submitted for nothing, with the same consequences as here-above, i.e. the delaying the startup of the jobs in the queue.

Submitting a large MPI job without testing with four or eight ranks first will result, in the event of an issue with the job, in a lot of resources being reserved (kept idle until all resources for the job are available) until the job starts. Even if sometimes, the resources can be used for back-filled jobs, this ultimately leads to the delaying the startup of the jobs in the queue.

Performing excessive I/O on a global filesystem rather than on a local filesystem will put a heavy burden on the global filesystem, impacting all running jobs that use the same filesystem. Filesystems used for global scratches, e.g. BeeGFS, are primarily designed for handling large files in parallel, and not large amounts of small files operations from the same client. Any job that performs lots of small reads/writes (many small files, frequent read/write in chunks smaller than 1MB, reading multiple small blocks from large files, etc.) must do that from a local file system ($LOCALSCRATCH) where it will not impact other jobs, and more importantly, will be much faster.

Storing large number of small files rather than consolidating them is also problematic on global filesystems, be it scratch directories or home directories. The storage technologies on arrays of disks are very different from your laptop’s SSD and cannot handle zillions of tiny files as efficiently. Most of the maintenance operations are heavily slowed down by the number of files. Tar archives or Singularity containers really help in such cases.

Misconfiguring the email options of a Slurm job (having a typo in the email, forgetting one of --mail-user or --mail-type, or commenting out only one of them in an attempt to deactivate the functionality) will generate a (useless) email to the admins. This is especially problematic with scripted submissions that generate large amounts of jobs. Besides the annoyance, many email sent by Slurm jobs can be considered harmful by the recipient’s email system and get the university to be banned from some email providers.

Leaving data on the scratch for ever reduces the amount of global scratch that is available for the jobs. Even if, on some clusters, the global scratch is cleaned yearly, data should not be left there when they are not used anymore. The data must either be deleted or transferred to an infrastructure designed for long-term storage. Be aware that filesystems used in global scratches are designed for performance more than for robustness, so the risk of data loss is low, but not negligible.

Under-using allocated resources either because of wrong job parametrisation, or software bug, prevents the resources from being used by other jobs and ultimately delay to the time to solution for all users. Furthermore, as most compute nodes are configured for performance, power-saving features are disabled, and the consumption of an idle node can be as high as half that of a fully-loaded node, even though producing no results. A compute node should therefore either be used or shut-down, but never in an idle state for no good reason.