Compiling and installing software from source¶
You will most probably, at some point, face a situation where you need additional software installed. One option is to ask the System Administrator to install the software globally on the cluster, which makes sense mainly for popular software used by many users.
Another option is to install the software locally in your home directory. Most of the time, this requires installing the software from the sources. If you do not have access to the sources, as is the case with many commercial software packages, you will need to check the documentation of the installation procedure to know how to install the software in a custom directory.
The whole process¶
Installing from source has an additional benefit: you then can tailor the compilation process to the hardware of the clusters.
It is customary to create a directory name .local in the home directory, to host the self-installed software. You can create it with
$ mkdir -p ~/.local/{bin,lib,src,include}
1. Download or upload the source code archive to your home directory on the cluster.¶
To do that, use the wget or the curl command. For instance, if the web link to the archive is https://www.repository.com/project/soft-2.3.tar.gz, issue the following command:
$ wget https://www.repository.com/project/soft-2.3.tar.gz
Alternatively, you can download the archive on your personal computer and then upload it on the cluster.
2. Unpack the archive¶
First go to the destination directory, for instance
$ cd ~/.local/src
and unpack the archive. If it is a .tar.gz or .tgz, simply run
$ tar -xzf soft-2.3.tar.gz
You might also encounter other archive types, such as .zip. Refer to this Wikipedia page to choose the right tool for unpacking.
Once the files are extracted, go to the newly-created directory:
$ cd soft-2.3
3. Load compiler suite¶
Before compiling your code, you need to choose the compiler toolchain that suit your needs. You can check the different versions of foss or intel toolchains
For exmaple for the GNU Compiler Collection
$ module spider foss
------------------------------------------------------------------------------------------------------------------------------
foss:
------------------------------------------------------------------------------------------------------------------------------
Description:
GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK
support), FFTW and ScaLAPACK.
Versions:
foss/2014b
foss/2015a
foss/2015b
...
Or the intel compiler toolchain
$ ml spider intel
------------------------------------------------------------------------------------------------------------------------------
intel:
------------------------------------------------------------------------------------------------------------------------------
Description:
Compiler toolchain including Intel compilers, Intel MPI and Intel Math Kernel Library (MKL).
Versions:
intel/2014b
intel/2015a
intel/2015b
Once you choose the toolchain you need to load it like:
$ module purge
The following modules were not unloaded:
(Use "module --force purge" to unload all):
1) databases 2) releases/2021b 3) tis/2017.01
$ module load releases/2021a intel/2021a
The following have been reloaded with a version change:
1) releases/2021b => releases/2021a
The two module command must be later set in the submission script. This allow your executable to find the shared libraries in the compute nodes.
4. Run the ./configure script¶
First of all, read any README, or INSTALL file that you could find in the directory.
Then, run the ./configure script. That script will analyse the available software (compilers, GNU tools, etc.) that are needed for the compilation process, and prepare the subsequent compilation scripts.
At this stage, you will choose the directory to which the software must be installed. That is done with the –prefix option of the configure script. For example:
$ ./configure --prefix ~/.local/
You can also choose the compiler and compiler options with environment variables passed to ./configure, e.g. to use the Intel compiler: CC=icc ./configure. Other interesting variables include CFLAGS, CPPFLAGS, LDFLAGS. Run ./configure –help for a detailed list.
Check the output of the ./configure script; it may report missing dependencies, which often lead to deactivation of some features of the program.
4. Run make and make install¶
Once the ./configure script has run, you will be able to build the software with the make command. This step may take several minutes, depending on the complexity of the software. To speed things up, use the -j n option of make to run the build process in parallel on n CPUs. For instance:
$ make -j 4
The next step is to install the software in the destination directory.
$ make install
Now, you can optionally remove the source directory and the archive file to save disk space as the binaries will be copied to ~/.local/bin and the libraries will go to ~/.local/lib.
If you are following instructions on a web page and the instructions are to sudo make install
, resist the urge to do so. Simply run make install
.
Final steps¶
You will need to make sure your environment is properly configured to use the programs you just installed.
Once the software is properly installed, make sure to include the .local
directories in the corresponding paths:
export PATH=~/.local/bin:$PATH
export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH
export LIBRARY_PATH=~/.local/lib:$LIBRARY_PATH
export CPATH=~/.local/include:$CPATH
It is advisable to put those lines in your .bash_profile file at your cluster home directory, which is sourced at each SSH login, to avoid typing them when starting each session.
To make sure your paths are correctly set, you can use the which
command
to see specifically which binary is called when you issue a given command,
and ldd
to see which dynamic libraries the binary is using.
Compiling for multiple CPU architectures¶
By default, compilers will tune the binary for the CPU of the machine they run on. So if you compile on, say cluster1, which is equiped with Intel’s Westmere processors, and then run on the newer SandyBridge processors of cluster2, the binaries of your program will not use the advanced features the SandyBridge processors offer. The two most common reasons for performances losses are reduced or inactive vectorisation (for instance not using the AVX SIMD instructions allow performing 8 double-precision floating point operations per clock cycle), and inefficient code scheduling (scheduling compute and I/O operations based on cache sizes and latencies that are not the correct ones).
What’s even worse, if you compile on a SandyBridge processor and then run on a Westmere processor, it might crash because it would be trying to use the AVX units which are absent on the latter processors.
With GCC¶
With GCC, you can mitigate this problem using the -march
and -mtune
arguments.
With the -march
argument, you can prevent software crashes by telling the
compiler to use only features that are present in earlier CPUs. If you specify
-march=core2
then the resulting binary is guaranteed to work on every
compute-node in the CÉCI clusters but with degraded performances in modern
ones. If you specify -march=westmere
, it will work everywhere except on
Hmem (decommissioned). With -march=sandybridge
, it will work only on
Hercules, Dragon1. To target the AMD Epyc CPUs of Hercules and NIC5 you can
specify -march=znver1
(Hercules) or -march=znver2
(NIC5). The later
options require GCC version 9 and later.
The complete list of valid parameter values can be found in the GCC documentation.
With the -mtune
argument, you can ask the compiler to optimize the binaries
for a specific architecture, while remaining in the limits imposed by the
-march
option. It will mainly work on optimizing the instruction scheduling
with respect to the CPU architecture. The -mtune
argument accepts the
same values as -march
.
A safe option is thus to set -march
to the CPU architecture of the oldest
cluster you plan on using, and -mtune
to the CPU architecture of the
cluster you plan on using the most. The CPU architecture of each cluster can
be found on the cluster page.
In your code, if you have functions that use CPU intrinsics or optimization
features that are CPU specific, you can use a feature of GCC named
Function multi-versioning.
The idea is to write several versions of the same function, with a specific
__attribute__
to tell GCC which version goes for which CPU. Then, at runtime,
the correct version of the function will be called based on the CPU on
which it is running.
See the following example:
#include<iostream>
using namespace std;
__attribute__ ((target ("default")))
int foo ()
{
// The default version of foo.
return 0;
}
__attribute__ ((target ("sse4.2")))
int foo ()
{
// foo version for SSE4.2
return 1;
}
__attribute__ ((target ("avx")))
int foo ()
{
// foo version for AVX
return 2;
}
int main ()
{
cout << "The function returned: " << foo() << "\n";
return 0;
}
When compiled with g++
version 4.9 or above, it outputs 0
on Hmem, 1
on Lemaitre2 and 2
on Nic4.
With the Intel compiler¶
The Intel compiler has a very interesting feature called ‘Multiple code paths’.
With the -x
option, you can specify which CPU features the compiler is
allowed to use, for instance -xSSE2
for CPUs that have the SSE2 feature,
namely every current Intel or AMD CPU (beware that the -xSSE2
might lead
to binaries that crash on AMD processors even if they have the SSE feature).
If you specify -xSSE4.2
, it will work everywhere except on Hmem. With
-xAVX
it will work only on Hercules, Dragon1, Vega and NIC4.
With the -ax
option, you can specify additional so-called ‘code paths’
that the compiler will add to the binary, using additional sets of features.
The ‘base path’ is specified with the -x
parameter, and the -ax
parameter allows compiling the code several times, each time for a different
CPU architecture, and packing it all in a single binary. The Intel Compiler
runtime will then decide, when the program runs, which portion of the binary
to use based on the CPU of the machine it is running. So you could specify
-axSSE4.2,AVX
to build a binary optimized for all the CÉCI computers.
More information for these options are available on the
Intel compiler website,
where the list of valid options is given. The option names are CPU features,
rather than CPU architecture names as for GCC. The features a CPU offers can
be found by looking at the contents of the /proc/cpuinfo
file.
Note
The Intel compiler is developed and targeted for the Intel hardware and hence it has some minor issues when using it with AMD hardware. Execution of code compiled with the Intel compiler might lead to crash when executed on the AMD CPUs of Hercules and NIC5. The recommended flag to use the AVX2 instructions supported by the AMD EPYC CPUs is -march=core-avx2.
In some cases, when compiling with the -xavx, -xavx2 or -xcore-avx2 flags your application will abort with a message that the processor is not compatible:
Please verify that both the operating system and the proce_ssor support Intel(R) X87,
CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, MOVBE, POPCNT, AVX, F16C, FMA,
BMI, LZCNT and AVX2 instructions.
To solve this problem, avoid using the above-mentioned flags for the compilation of the main() function. Avoiding the use of Interprocedural Optimization (remove the -ipo flag) may also be tried.
The Intel compiler has a feature similar to GCC’s function multiversioning, it is called Manual processor dispatch
#include <stdio.h>
// need to create specific function versions for the following processors:
__declspec(cpu_dispatch(generic, core_i7_sse4_2, core_2nd_gen_avx))
// stub that will call the appropriate specific function version
int foo() {};
__declspec(cpu_specific(generic))
int foo() {
return 0;
}
__declspec(cpu_specific(core_i7_sse4_2))
int foo() {
return 1;
}
__declspec(cpu_specific(core_2nd_gen_avx))
int foo() {
return 2;
}
int main() {
foo();
printf("Function returned: %d\n", foo());
return 0;
}
The above program, when compiled with the Intel compiler, will behave as the one with function multiversioning.