# Compiling and installing software from source¶

You will most probably, at some point, face a situation where you need additional software installed. One option is to ask the System Administrator to install the software globally on the cluster, which makes sense mainly for popular software used by many users.

Another option is to install the software locally in your home directory. Most of the time, this requires installing the software from the sources. If you do not have access to the sources, as is the case with many commercial software packages, you will need to check the documentation of the installation procedure to know how to install the software in a custom directory.

## The whole process¶

Installing from source has an additional benefit: you then can tailor the compilation process to the hardware of the clusters.

It is customary to create a directory name .local in the home directory, to host the self-installed software. You can create it with



### 2. Unpack the archive¶

First go to the destination directory, for instance

$cd ~/.local/src  and unpack the archive. If it is a .tar.gz or .tgz, simply run $ tar -xzf soft-2.3.tar.gz


You might also encounter other archive types, such as .zip. Refer to this Wikipedia page to choose the right tool for unpacking.

Once the files are extracted, go to the newly-created directory:

$cd soft-2.3  ### 3. Run the ./configure script¶ First of all, read any README, or INSTALL file that you could find in the directory. Then, run the ./configure script. That script will analyse the available software (compilers, GNU tools, etc.) that are needed for the compilation process, and prepare the subsequent compilation scripts. At this stage, you will choose the directory to which the software must be installed. That is done with the –prefix option of the configure script. For example: $ ./configure --prefix ~/.local/


You can also choose the compiler and compiler options with environment variables passed to ./configure, e.g. to use the Intel compiler: CC=icc ./configure. Other interesting variables include CFLAGS, CPPFLAGS, LDFLAGS. Run ./configure –help for a detailed list.

Check the output of the ./configure script; it may report missing dependencies, which often lead to deactivation of some features of the program.

### 4. Run make and make install¶

Once the ./configure script has run, you will be able to build the software with the make command. This step may take several minutes, depending on the complexity of the software. To speed things up, use the -j n option of make to run the build process in parallel on n CPUs. For instance:

$make -j 4  The next step is to install the software in the destination directory. $ make install


Now, you can optionally remove the source directory and the archive file to save disk space as the binaries will be copied to ~/.local/bin and the libraries will go to ~/.local/lib.

If you are following instructions on a web page and the instructions are to sudo make install, resist the urge to do so. Simply run make install.

### Final steps¶

You will need to make sure your environment is properly configured to use the programs you just installed.

Once the software is properly installed, make sure to include the .local directories in the corresponding paths:

export PATH=~/.local/bin:$PATH export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH
export LIBRARY_PATH=~/.local/lib:$LIBRARY_PATH export CPATH=~/.local/include:$CPATH


It is advisable to put those lines in your .bash_profile file at your cluster home directory, which is sourced at each SSH login, to avoid typing them when starting each session.

To make sure your paths are correctly set, you can use the which command to see specifically which binary is called when you issue a given command, and ldd to see which dynamic libraries the binary is using.

## Compiling for multiple CPU architectures¶

By default, compilers will tune the binary for the CPU of the machine they run on. So if you compile on, say Lemaitre2, which is equiped with Intel’s Westmere processors, and then run on the newer SandyBridge processors of NIC4, the binaries of your program will not use the advanced features the SandyBridge processors offer. The two most common reasons for performances losses are reduced or inactive vectorisation (for instance not using the AVX SIMD instructions allow performing 8 double-precision floating point operations per clock cycle), and inefficient code scheduling (scheduling compute and I/O operations based on cache sizes and latencies that are not the correct ones).

What’s even worse, if you compile on a SandyBridge processor and then run on a Westmere processor, it might crash because it would be trying to use the AVX units which are absent on the latter processors.

### With GCC¶

With GCC, you can mitigate this problem using the -march and -mtune arguments.

With the -march argument, you can prevent software crashes by telling the compiler to use only features that are present in earlier CPUs. If you specify -march=core2 then the resulting binary is guaranteed to work on every computer in the CÉCI clusters. If you specify -march=westmere, it will work everywhere except on Hmem. With -march=sandybridge, it will work only on Hercules, Dragon1, Vega and NIC4. With -march=bdver1, it will only work on Vega.

The complete list of valid parameter values can be found in the GCC documentation.

With the -mtune argument, you can ask the compiler to optimize the binaries for a specific architecture, while remaining in the limits imposed by the -march option. It will mainly work on optimizing the instruction scheduling with respect to the CPU architecture. The -mtune argument accepts the same values as -march.

A safe option is thus to set -march to the CPU architecture of the oldest cluster you plan on using, and -mtune to the CPU architecture of the cluster you plan on using the most. The CPU architecture of each cluster can be found on the cluster page.

In your code, if you have functions that use CPU intrinsics or optimization features that are CPU specific, you can use a feature of GCC named Function multi-versioning. The idea is to write several versions of the same function, with a specific __attribute__ to tell GCC which version goes for which CPU. Then, at runtime, the correct version of the function will be called based on the CPU on which it is running.

See the following example:

#include<iostream>
using namespace std;

__attribute__ ((target ("default")))
int foo ()
{
// The default version of foo.
return 0;
}

__attribute__ ((target ("sse4.2")))
int foo ()
{
// foo version for SSE4.2
return 1;
}

__attribute__ ((target ("avx")))
int foo ()
{
// foo version for AVX
return 2;
}

int main ()
{
cout << "The function returned: " << foo() << "\n";
return 0;
}


When compiled with g++ version 4.9 or above, it outputs 0 on Hmem, 1 on Lemaitre2 and 2 on Nic4.

### With the Intel compiler¶

The Intel compiler has a very interesting feature called ‘Multiple code paths’. With the -x option, you can specify which CPU features the compiler is allowed to use, for instance -xSSE2 for CPUs that have the SSE2 feature, namely every current Intel or AMD CPU (beware that the -xSSE2 might lead to binaries that crash on AMD processors even if they have the SSE feature). If you specify -xSSE4.2, it will work everywhere except on Hmem. With -xAVX it will work only on Hercules, Dragon1, Vega and NIC4.

With the -ax option, you can specify additional so-called ‘code paths’ that the compiler will add to the binary, using additional sets of features. The ‘base path’ is specified with the -x parameter, and the -ax parameter allows compiling the code several times, each time for a different CPU architecture, and packing it all in a single binary. The Intel Compiler runtime will then decide, when the program runs, which portion of the binary to use based on the CPU of the machine it is running. So you could specify -axSSE4.2,AVX to build a binary optimized for all the CÉCI computers.

More information for these options are available on the Intel compiler website, where the list of valid options is given. The option names are CPU features, rather than CPU architecture names as for GCC. The features a CPU offers can be found by looking at the contents of the /proc/cpuinfo file.

The Intel compiler has a feature similar to GCC’s function multiversioning, it is called Manual processor dispatch

#include <stdio.h>

// need to create specific function versions for the following processors:
__declspec(cpu_dispatch(generic, core_i7_sse4_2, core_2nd_gen_avx))
// stub that will call the appropriate specific function version
int foo() {};

__declspec(cpu_specific(generic))
int foo() {
return 0;
}

__declspec(cpu_specific(core_i7_sse4_2))
int foo() {
return 1;
}

__declspec(cpu_specific(core_2nd_gen_avx))
int foo() {
return 2;
}

int main() {
foo();
printf("Function returned: %d\n", foo());
return 0;
}


The above program, when compiled with the Intel compiler, will behave as the one with function multiversioning.