Choosing and using a cluster wisely

In order to ensure a fair access to the computing resources to everyone, it is important that resources be used at the maximum of their possibilities.

Indeed, not all users are scientifically born equal. Some need to solve problems that parallelise embarrassingly easily, while other must design complex parallel algorithms with a lot of communication, or need to write awful loads of data to disks. Some have accessibility to well-written software designed for parallel use on a cluster, or have the resources to design such a software, while others must rely on not-so-great software that unfortunately the best to solve their problem.

Therefore, it is important that resources which are crucial to a job are not monopolized by another job which does not require them.

Note

By connecting to a CÉCI cluster, you accept the rules related to the Fair use of the CÉCI clusters .

The choice of the cluster

The CÉCI offers quite a diversity of clusters, each tailored for one or two job types. Those job types are detailed in the Clusters page.

The general rules can be expressed as follows: if a cluster

  • has a fast network (e.g. Infiniband) and your job does not make intense use of network communication (i.e. with MPI), please prefer another one;
  • has a fast scratch space (e.g. Lustre, Fraunhoffer, GPFS) and your job does not produce loads of data, please prefer another one;
  • has large memory on the nodes (more than 300GB) and your job does not need that much RAM, please prefer another one;
  • has queues with large allowed running time (more than 7 days) and your job is able to checkpoint itself, or will typically run quicker than that, please prefer another one.

Of courses these are general guidelines, and if you see a cluster being underused, and you have jobs waiting, feel free to submit them on that cluster. But the sysadmins may or may not let your job run if it is deemed unfit.

The choice of the nodes

The choice of the nodes and of the placement of the processes on the nodes is subject to the same logic.

If a node

  • has an accelerator (e.g. GPU, FPGA, Xeon Phi) and your job is not able to take advantage of it, please prefer another one;
  • has a brand new CPU of the latest generation and your program is not compiled to take advantage of the particular features of that CPU family, please prefer another one;
  • has a specific software licensed attached to it and your job does not need it, please prefer another one;
  • has an extra large local scratch filesystem (more than 2TB) and your job does not read nor produce large amounts of data, please prefer another one.

According to the same ideas, if your job is using MPI and the network is fast, it very rarely makes sense to choose specifically the process placement (e.g. asking for 4 processes on 4 nodes) rather than letting the job manager decide for you as it will (1) try its best to put all the processes on the smallest amount of nodes, and (2) it is topology aware and will try its best to put all processes on nodes that are no further away than one hop. It only makes sense if each process is using a local resource heavily (e.g. a local filesystem) in the hope that the other jobs on the machine do not.

The job parametrization

Once more, to ensure a fair use of the resources, make sure that you specify as tightly as possible the resources you need, for instance in terms of running time, or memory. As a general rule, do not let the default values and do not copy the script of a colleague and use it blindly. Engage in what you are doing, understand what you are doing, and do it properly.

Letting the default memory, if you need less than the default, leads to memory not being used, and to jobs waiting unnecessarily for memory to become available.

Letting the default maximum running time, if your job is expected to need less than that, leads to your job not being eligible for backfill, starting later than it could have, and ultimately delaying the jobs with lower priority possibly for no reason. Always observe the running time of your jobs and try to learn how it is affected by the parameters and/or data size of your job.

Asking for many cores should only be done once you have the certainty that your job scales, that is that its running time is nearly inversely proportional to the number of CPU used. Try with a low number, time your jobs, and grow step by step the number of CPUs. If your job saturates at 8 CPUs and you request more than that, they are wasted and unavailable for the other jobs.