Disk space

Each cluster is equipped with several file systems that can be used to store files. These disk spaces have different properties and each of them is designed to best fit different usage intents. They are listed in the table here-under.

Disk space Scope Environment variables (depends on cluster)
Home cluster $HOME
Workdir cluster $GLOBALSCRATCH
Local scratch node $TMPDIR,$LOCALSCRATCH
Global Home CÉCI $CECIHOME
Transfer CÉCI $CECITRSF
Long-term external  

The ‘Scope’ column indicates from where the disk space is accessible. A scope of ‘cluster’ means all compute nodes and frontends in the cluster share the same filesystem. By contrast, a scope of ‘node’ refers to storage space that is distinct for each node ; it is local to that node and cannot be accessed from outside the node. The ‘CÉCI’ scope means the filesystem is accessible from all compute nodes and all frontends of all CÉCI clusters. Finally, a scope of ‘external’ refers to machines that are outside the perimeter of the CÉCI consortium and are managed by the universities.

../../_images/storage.png

Danger

There is no backup of the data stored on any cluster. Any removed file is lost for ever. It is the user’s responsibility to keep a copy of the contents of their home in a safe place.

Home file system

Upon login on a login node or front-end, you will be end up in your home directory. This directory is available on the front-end and all compute nodes of a cluster. Its full path can be shown with echo $HOME and you can return there with a simple cd command.

The home filesystem is dedicated to source code (programs, scripts), configuration files, and small datasets (like input files.)

Do not use this area for your main working activities; use your workdir directory instead.

Note

Quotas and time limits are enforced on the use of space on the HOME directories. See section “Quota” for details.

Workdir file system

The workdir or globalscratch is a high-performance shared disk space common to all compute nodes and to the front-end of a cluster. Its full path can be shown with echo $GLOBALSCRATCH. The path will differ from one cluster to other. The globalscratch is often built using a fast parallel filesystem such as Lustre, GPFS, or FraunhofferFS.

The workdir should be used to store the files generated by your batch jobs. You can also copy there large input files if required. After that, it is customary in the job script to create a subdirectory with the job id, where all temporary data will be written, and to clean that directory when the job finishes, and after having copied results to another location either on the same filesystem if the results are to be consumed by a later job, or on another filesystem such as the home or a remote long-term storage.

To use a globalscratch, you might first need to create a directory by yourself. It is then common to name it after your login. On some clusters, it might already be created for you. Anyway, it is always safe to try to create it on any CÉCI cluster with the command.

mkdir -p $GLOBALSCRATCH

Note

Some clusters have a quota on the use of the workdir directories. See section “Quota” for details.

Warning

The data in the globalscratch directory can be removed at any time specialy during maintenance periods.

Local Scratch file system

The local scratch is the temporary disk space available on all compute nodes and is only visible from within the compute node it belongs to. On the CÉCI clusters, they are available through the $LOCALSCRATCH or $TMPDIR environment variable. Often they are built on top of a fast, but redundancy-lacking, RAID-0 system.

To use a local scratch, you will need to create a directory for your job in the local scratch space. It is custom to name it after the jobid:

mkdir -p $LOCALSCRATCH/$SLURM_JOB_ID

There you can write/read temporary results during your job, copy the results of interest back to the home directory, and then delete it at the end of the job script. A submission script example can be find in F.A.Q.-Q11 section

Note

Hercules

Files stored in the scratch directory on each node are removed immediately after the job terminates. You will not be able to access files in the scratch directory after your job has completed. Furthermore files in scratch directory are not accessible from any other nodes (compute node and login nodes). Therefore, all files you want to save must be copied from the scratch directory to your home as part of your job.

Using the scratch directory when running a batch job is often more efficient than using the home or workdir. If your jobs performs a lot of disk I/O to files that does not need to be shared between nodes, then please use this directory. This relieves the load on the central disk servers, and most of the time also makes your job run faster.

Warning

There is not quota limit on the local scratch. The user has to be careful not to fill the space otherwise the job will probably crash.

The scratch size depends of the node type. For example on Hercules:

Node type Size
Dell M610 600 GB
Dell M610x 600 GB
HP DL360 600 GB
HP SL230 1.2 TB

CECI’s Common filesystem

A detailed information about Global Home and Transfer filesystem is available on the common filesystem section.

Long-term storage file system

Long-term storage or archive is also built on stable technologies, but quota there are less limited. The downside is that the long-term storage is often not directly connected to the cluster so it can be quite slow. It may also not be free of charge. CECI does not provide long-term storage. Ask the systems administrator at your institution which kind of long-term storage they offer. Then you ca transfer your data as explained in the file transfer section.

Final remarks

As scratch spaces are not meant to store data in the long term, you should expect them to be cleaned automatically after some time. This does not mean that you should not do it yourself though: always clean up after your job. This is especially important after a job crashes before the cleaning operations in the submission script had a chance to run.

You need to be careful about filenaming when using a parallel job to avoid overwriting files by two different parts (threads, or processes) of the program, especially on global filesystems.

Even if your data is small and the quota allow you working in your home directory, you should consider using scratch spaces as they are in general much faster. The global filesystems are much faster because they handle requests and store data in parallel, while the local filesystems are faster as they do not need to access the network and they are used by far less jobs at a time.

Quota

Quotas are setup on your home directory and in some clusters on your workdir space.

The quota system will be based on soft quota, hard quota and a grace period.

A soft quota is the disk space level below which nothing happens for the user. When your reach the soft quota, a warning email is sent to you on some systems. On others, you get a message when you connect. This does not prevent you writing more data until the grace period is reached.

The grace period is the amount of time during which you are allowed to go over the soft limit before the system blocks your ability to write more data.

The hard quota is the disk space level above which the system blocks immediately your ability to write more data. However, you will still be able to read your data. If your hard quota is exceeded or if you exceed the grace time for your soft quota, your will be blocked from submitting jobs until you clean your space.

The table below present the storage quotas for CÉCI user accounts on each cluster and the command to get it :

Cluster file system Soft quota Hard quota Grace Command
Dragon1 [1]

home

workdir

20 GB

20 GB

23 GB

23 GB

  quota -s
Hercules

home

workdir

200 GB

400 GB

1 TB

4 TB

8 weeks

2 weeks

hc_diskquota
Hmem

home

workdir

48 GB

none

50 GB

none

1 week quota -s
Lemaitre2

home

workdir

48 GB

none

50 GB

none

1 week quota -s
Lemaitre3

home

workdir

100 GB

none

100 GB

none

n/a quota
Nic4

home

workdir

20 GB

none

25 GB

none

  quota -s
Vega [1]

home

workdir

5 TB

5 TB

6 TB

6 TB

  /usr/lpp/mmfs/bin/mmlsquota
All Clusters Global Home 100GB 120 GB   ls $CECIHOME; quota -s
All Clusters Transfer 1 TB 10 TB 10 days  
[1](1, 2) Clusters where $HOME and $GLOBALSCRATCH are the same filesystem so the quota is applied on both.