Compiling and installing software from source

You will most probably, at some point, face a situation where you need additional software installed. One option is to ask the System Administrator to install the software globally on the cluster, which makes sense mainly for popular software used by many users.

Another option is to install the software locally in your home directory. Most of the time, this requires installing the software from the sources. If you do not have access to the sources, as is the case with many commercial software packages, you will need to check the documentation of the installation procedure to know how to install the software in a custom directory.

The whole process

Installing from source has an additional benefit: you then can tailor the compilation process to the hardware of the clusters.

It is customary to create a directory name .local in the home directory, to host the self-installed software. You can create it with

$ mkdir -p ~/.local/{bin,lib,src,include}

1. Download or upload the source code archive to your home directory on the cluster.

To do that, use the wget or the curl command. For instance, if the web link to the archive is https://www.repository.com/project/soft-2.3.tar.gz, issue the following command:

$ wget https://www.repository.com/project/soft-2.3.tar.gz

Alternatively, you can download the archive on your personal computer and then upload it on the cluster.

2. Unpack the archive

First go to the destination directory, for instance

$ cd ~/.local/src

and unpack the archive. If it is a .tar.gz or .tgz, simply run

$ tar -xzf soft-2.3.tar.gz

You might also encounter other archive types, such as .zip. Refer to this Wikipedia page to choose the right tool for unpacking.

Once the files are extracted, go to the newly-created directory:

$ cd soft-2.3

3. Load compiler suite

Before compiling your code, you need to choose the compiler toolchain that suit your needs. You can check the different versions of foss or intel toolchains

For exmaple for the GNU Compiler Collection

$ module spider foss

------------------------------------------------------------------------------------------------------------------------------
  foss:
------------------------------------------------------------------------------------------------------------------------------
    Description:
      GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK
      support), FFTW and ScaLAPACK.

     Versions:
        foss/2014b
        foss/2015a
        foss/2015b
        ...

Or the intel compiler toolchain

$ ml spider intel

------------------------------------------------------------------------------------------------------------------------------
  intel:
------------------------------------------------------------------------------------------------------------------------------
    Description:
      Compiler toolchain including Intel compilers, Intel MPI and Intel Math Kernel Library (MKL).

     Versions:
        intel/2014b
        intel/2015a
        intel/2015b

Once you choose the toolchain you need to load it like:

$ module purge
The following modules were not unloaded:
  (Use "module --force purge" to unload all):

  1) databases   2) releases/2021b   3) tis/2017.01
$ module load releases/2021a intel/2021a

The following have been reloaded with a version change:
  1) releases/2021b => releases/2021a

The two module command must be later set in the submission script. This allow your executable to find the shared libraries in the compute nodes.

4. Run the ./configure script

First of all, read any README, or INSTALL file that you could find in the directory.

Then, run the ./configure script. That script will analyse the available software (compilers, GNU tools, etc.) that are needed for the compilation process, and prepare the subsequent compilation scripts.

At this stage, you will choose the directory to which the software must be installed. That is done with the –prefix option of the configure script. For example:

$ ./configure --prefix ~/.local/

You can also choose the compiler and compiler options with environment variables passed to ./configure, e.g. to use the Intel compiler: CC=icc ./configure. Other interesting variables include CFLAGS, CPPFLAGS, LDFLAGS. Run ./configure –help for a detailed list.

Check the output of the ./configure script; it may report missing dependencies, which often lead to deactivation of some features of the program.

4. Run make and make install

Once the ./configure script has run, you will be able to build the software with the make command. This step may take several minutes, depending on the complexity of the software. To speed things up, use the -j n option of make to run the build process in parallel on n CPUs. For instance:

$ make -j 4

The next step is to install the software in the destination directory.

$ make install

Now, you can optionally remove the source directory and the archive file to save disk space as the binaries will be copied to ~/.local/bin and the libraries will go to ~/.local/lib.

If you are following instructions on a web page and the instructions are to sudo make install, resist the urge to do so. Simply run make install.

Final steps

You will need to make sure your environment is properly configured to use the programs you just installed.

Once the software is properly installed, make sure to include the .local directories in the corresponding paths:

export PATH=~/.local/bin:$PATH
export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH
export LIBRARY_PATH=~/.local/lib:$LIBRARY_PATH
export CPATH=~/.local/include:$CPATH

It is advisable to put those lines in your .bash_profile file at your cluster home directory, which is sourced at each SSH login, to avoid typing them when starting each session.

To make sure your paths are correctly set, you can use the which command to see specifically which binary is called when you issue a given command, and ldd to see which dynamic libraries the binary is using.

Compiling for multiple CPU architectures

By default, compilers will tune the binary for the CPU of the machine they run on. So if you compile on, say cluster1, which is equiped with Intel’s Westmere processors, and then run on the newer SandyBridge processors of cluster2, the binaries of your program will not use the advanced features the SandyBridge processors offer. The two most common reasons for performances losses are reduced or inactive vectorisation (for instance not using the AVX SIMD instructions allow performing 8 double-precision floating point operations per clock cycle), and inefficient code scheduling (scheduling compute and I/O operations based on cache sizes and latencies that are not the correct ones).

What’s even worse, if you compile on a SandyBridge processor and then run on a Westmere processor, it might crash because it would be trying to use the AVX units which are absent on the latter processors.

With GCC

With GCC, you can mitigate this problem using the -march and -mtune arguments.

With the -march argument, you can prevent software crashes by telling the compiler to use only features that are present in earlier CPUs. If you specify -march=core2 then the resulting binary is guaranteed to work on every compute-node in the CÉCI clusters but with degraded performances in modern ones. If you specify -march=westmere, it will work everywhere except on Hmem (decommissioned). With -march=sandybridge, it will work only on Hercules, Dragon1. To target the AMD Epyc CPUs of Hercules and NIC5 you can specify -march=znver1 (Hercules) or -march=znver2 (NIC5). The later options require GCC version 9 and later.

The complete list of valid parameter values can be found in the GCC documentation.

With the -mtune argument, you can ask the compiler to optimize the binaries for a specific architecture, while remaining in the limits imposed by the -march option. It will mainly work on optimizing the instruction scheduling with respect to the CPU architecture. The -mtune argument accepts the same values as -march.

A safe option is thus to set -march to the CPU architecture of the oldest cluster you plan on using, and -mtune to the CPU architecture of the cluster you plan on using the most. The CPU architecture of each cluster can be found on the cluster page.

In your code, if you have functions that use CPU intrinsics or optimization features that are CPU specific, you can use a feature of GCC named Function multi-versioning. The idea is to write several versions of the same function, with a specific __attribute__ to tell GCC which version goes for which CPU. Then, at runtime, the correct version of the function will be called based on the CPU on which it is running.

See the following example:

#include<iostream>
using namespace std;

__attribute__ ((target ("default")))
int foo ()
{
  // The default version of foo.
  return 0;
}

__attribute__ ((target ("sse4.2")))
int foo ()
{
  // foo version for SSE4.2
  return 1;
}

__attribute__ ((target ("avx")))
int foo ()
{
  // foo version for AVX
  return 2;
}

int main ()
{
  cout << "The function returned: " << foo() << "\n";
  return 0;
}

When compiled with g++ version 4.9 or above, it outputs 0 on Hmem, 1 on Lemaitre2 and 2 on Nic4.

With the Intel compiler

The Intel compiler has a very interesting feature called ‘Multiple code paths’. With the -x option, you can specify which CPU features the compiler is allowed to use, for instance -xSSE2 for CPUs that have the SSE2 feature, namely every current Intel or AMD CPU (beware that the -xSSE2 might lead to binaries that crash on AMD processors even if they have the SSE feature). If you specify -xSSE4.2, it will work everywhere except on Hmem. With -xAVX it will work only on Hercules, Dragon1, Vega and NIC4.

With the -ax option, you can specify additional so-called ‘code paths’ that the compiler will add to the binary, using additional sets of features. The ‘base path’ is specified with the -x parameter, and the -ax parameter allows compiling the code several times, each time for a different CPU architecture, and packing it all in a single binary. The Intel Compiler runtime will then decide, when the program runs, which portion of the binary to use based on the CPU of the machine it is running. So you could specify -axSSE4.2,AVX to build a binary optimized for all the CÉCI computers.

More information for these options are available on the Intel compiler website, where the list of valid options is given. The option names are CPU features, rather than CPU architecture names as for GCC. The features a CPU offers can be found by looking at the contents of the /proc/cpuinfo file.

Note

The Intel compiler is developed and targeted for the Intel hardware and hence it has some minor issues when using it with AMD hardware. Execution of code compiled with the Intel compiler might lead to crash when executed on the AMD CPUs of Hercules and NIC5. The recommended flag to use the AVX2 instructions supported by the AMD EPYC CPUs is -march=core-avx2.

In some cases, when compiling with the -xavx, -xavx2 or -xcore-avx2 flags your application will abort with a message that the processor is not compatible:

Please verify that both the operating system and the proce_ssor support Intel(R) X87,
CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, MOVBE, POPCNT, AVX, F16C, FMA,
BMI, LZCNT and AVX2 instructions.

To solve this problem, avoid using the above-mentioned flags for the compilation of the main() function. Avoiding the use of Interprocedural Optimization (remove the -ipo flag) may also be tried.

The Intel compiler has a feature similar to GCC’s function multiversioning, it is called Manual processor dispatch

#include <stdio.h>

// need to create specific function versions for the following processors:
__declspec(cpu_dispatch(generic, core_i7_sse4_2, core_2nd_gen_avx))
// stub that will call the appropriate specific function version
int foo() {};

__declspec(cpu_specific(generic))
int foo() {
        return 0;
}


__declspec(cpu_specific(core_i7_sse4_2))
int foo() {
        return 1;
}

__declspec(cpu_specific(core_2nd_gen_avx))
int foo() {
        return 2;
}

int main() {
foo();
printf("Function returned: %d\n", foo());
return 0;
}

The above program, when compiled with the Intel compiler, will behave as the one with function multiversioning.