# Disk space¶

Danger

There is no backup of the data stored on any cluster. Any removed file is lost for ever. It is the user’s responsibility to keep a copy of the contents of their home in a safe place.

Each cluster is equipped with several file systems that can be used to store files. These disk spaces have different properties and each of them is designed to best fit different usage intents. They are listed in the table here-under.

Disk space Scope Environment variables (depends on cluster)
Home cluster $HOME Workdir cluster$GLOBALSCRATCH
Local scratch node $TMPDIR,$LOCALSCRATCH
Global Home CÉCI $CECIHOME Transfer CÉCI$CECITRSF
Long-term external

The ‘Scope’ column indicates from where the disk space is accessible. A scope of ‘cluster’ means all compute nodes and frontends in the cluster share the same filesystem. By contrast, a scope of ‘node’ refers to storage space that is distinct for each node ; it is local to that node and cannot be accessed from outside the node. The ‘CÉCI’ scope means the filesystem is accessible from all compute nodes and all frontends of all CÉCI clusters. Finally, a scope of ‘external’ refers to machines that are outside the perimeter of the CÉCI consortium and are managed by the universities.

Note

You can also check the presentation given on our yearly training sessions about using the different storage solutions on the clusters . On the final slides is pointed where to find some slurm submission scripts examples.

Upon login on a login node or front-end, you will be end up in your home directory. This directory is available on the front-end and all compute nodes of a cluster. Its full path can be shown with echo $HOME and you can return there with a simple cd command. The home filesystem is dedicated to source code (programs, scripts), configuration files, and small datasets (like input files.) Do not use this area for your main working activities; use your workdir directory instead (see next section). Note Quotas and time limits are enforced on the use of space on the HOME directories. See section “Quota” for details. ## Workdir file system¶ The workdir or globalscratch is a high-performance shared disk space common to all compute nodes and to the front-end of a cluster. Its full path can be shown with echo$GLOBALSCRATCH. The path will differ from one cluster to other. The globalscratch is often built using a fast parallel filesystem such as Lustre, GPFS, or FraunhofferFS.

The workdir should be used to store the files generated by your batch jobs. You can also copy there large input files if required. After that, it is customary in the job script to create a subdirectory with the job id, where all temporary data will be written, and to clean that directory when the job finishes, and after having copied results to another location either on the same filesystem if the results are to be consumed by a later job, or on another filesystem such as the home or a remote long-term storage.

To use a globalscratch, you might first need to create a directory by yourself. It is then common to name it after your login. On some clusters, it might already be created for you. Anyway, it is always safe to try to create it on any CÉCI cluster with the command.

mkdir -p $GLOBALSCRATCH  Note Some clusters have a quota on the use of the workdir directories. See section “Quota” for details. Warning The data in the globalscratch directory can be removed at any time specialy during maintenance periods. ## Local Scratch file system¶ The local scratch is the temporary disk space available on all compute nodes and is only visible from within the compute node it belongs to. On the CÉCI clusters, they are available through the $LOCALSCRATCH or $TMPDIR environment variable. Often they are built on top of a fast, but redundancy-lacking, RAID-0 system. To use a local scratch, you will need to create a directory for your job in the local scratch space. It is custom to name it after the jobid: mkdir -p$LOCALSCRATCH/$SLURM_JOB_ID  There you can write/read temporary results during your job, copy the results of interest back to the home directory, and then delete it at the end of the job script. A submission script example can be find in F.A.Q.-Q11 section Note Hercules Files stored in the scratch directory on each node are removed immediately after the job terminates. You will not be able to access files in the scratch directory after your job has completed. Furthermore files in scratch directory are not accessible from any other nodes (compute node and login nodes). Therefore, all files you want to save must be copied from the scratch directory to your home as part of your job. Using the scratch directory when running a batch job is often more efficient than using the home or workdir. If your jobs performs a lot of disk I/O to files that does not need to be shared between nodes, then please use this directory. This relieves the load on the central disk servers, and most of the time also makes your job run faster. Warning There is not quota limit on the local scratch. The user has to be careful not to fill the space otherwise the job will probably crash. The scratch size depends of the node type. For example on Hercules: Node type Size Dell M610 600 GB Dell M610x 600 GB HP DL360 600 GB HP SL230 1.2 TB ## CECI’s Common filesystem¶ A detailed information about Global Home and Transfer filesystem is available on the common filesystem section. ## Long-term storage file system¶ Long-term storage or archive is also built on stable technologies, but quota there are less limited. The downside is that the long-term storage is often not directly connected to the cluster so it can be quite slow. It may also not be free of charge. CECI does not provide long-term storage. Ask the systems administrator at your institution which kind of long-term storage they offer. Then you ca transfer your data as explained in the file transfer section. ## Final remarks¶ As scratch spaces are not meant to store data in the long term, you should expect them to be cleaned automatically after some time. This does not mean that you should not do it yourself though: always clean up after your job. This is especially important after a job crashes before the cleaning operations in the submission script had a chance to run. You need to be careful about filenaming when using a parallel job to avoid overwriting files by two different parts (threads, or processes) of the program, especially on global filesystems. Even if your data is small and the quota allow you working in your home directory, you should consider using scratch spaces as they are in general much faster. The global filesystems are much faster because they handle requests and store data in parallel, while the local filesystems are faster as they do not need to access the network and they are used by far less jobs at a time. ## Quota¶ Quotas are setup on your home directory and in some clusters on your workdir space. The quota system will be based on soft quota, hard quota and a grace period. A soft quota is the disk space level below which nothing happens for the user. When your reach the soft quota, a warning email is sent to you on some systems. On others, you get a message when you connect. This does not prevent you writing more data until the grace period is reached. The grace period is the amount of time during which you are allowed to go over the soft limit before the system blocks your ability to write more data. The hard quota is the disk space level above which the system blocks immediately your ability to write more data. However, you will still be able to read your data. If your hard quota is exceeded or if you exceed the grace time for your soft quota, your will be blocked from submitting jobs until you clean your space. The table below present the storage quotas for CÉCI user accounts on each cluster. Use the command ceci-quota to get all your current quotas usage. Cluster file system env variable Soft quota Hard quota Grace NIC5 home workdir$HOME

$GLOBALSCRATCH 100 GB 5 TB 110 GB 5 TB Dragon2 home workdir$HOME

$GLOBALSCRATCH 40 GB 44 GB Dragon1 [1] home workdir$HOME

$GLOBALSCRATCH 20 GB 20 GB 23 GB 23 GB Hercules home workdir$HOME

$GLOBALSCRATCH 200 GB 400 GB 1 TB 4 TB 9 weeks 1 weeks Lemaitre3 home workdir$HOME

$GLOBALSCRATCH 100 GB none 100 GB none n/a Nic4 home workdir$HOME

$GLOBALSCRATCH 20 GB none 25 GB none All Clusters Global Home Transfer$CECIHOME

$CECITRSF 100GB 1 TB 120 GB 10 TB 10 days  [1] Clusters where$HOME and \$GLOBALSCRATCH are the same filesystem so the quota is applied on both.