Installing software by yourself

Installing languages extensions

With scripting languages, even if the interpreter is installed globally, you have the possibility to install additional packages locally in your home directory.

Python

First of all, we discourage the use of Conda on the CÉCI clusters. See Conda .

For python include the --user option to pip install, e.g.:

$ pip install --user mymodule

will install the fictitious mymodule package in your home directory $HOME/.local/. Once that is done, you will need to make sure:

  • $HOME/.local/bin is in your PATH variable and
  • $HOME/.local/lib/pythonx.y/site-packages/ is in your PYTHONPATH.

Make sure to replace the x.y part with the actual version of Python you are using. For instance:

$ export PATH=$HOME/.local/bin:$PATH
$ export PYTHONPATH=$HOME/.local/lib/python2.7/site-packages/:$PYTHONPATH

It is important when you install a package that you load the correct Python module, and use the Pip option --no-binary :all: to recompile from source rather than install pre-compiled binaries whenever possible. See more information in the PIP documentation . You can use GCC optimisation flags when doing so. Example:

CFLAGS='-O2 -pipe -march=sandybridge' pip install --no-binary :all: PACKAGE

The above example builds the PACKAGE with optimisation options that are compatible with most clusters, and hence suboptimal on recent ones. See With GCC for more information.

Virtualenvs

If you are already used to create Python virtualenvs for managing your custom modules installations (if you are not, is a good idea to learn about them), take into account that on the clusters we provide, apart of different core Python versions, installations of different python modules bundles compatible with them. If you need a specific python module not available in the environment when you do e.g. module load Python/3.6.6-foss-2018b, check always with the module avail command or in the list of installed software if there is a specific installation provided for that Python installation.

In the case that you need mymodule which is not available, check carefully which are its dependencies and verify if some of those are available. Let’s imagine that mymodule has as requirements numpy, matplotlib, h5py and Keras. On some clusters there are specific installations for those, so you can load them

module load Python/3.6.6-foss-2018b
module load matplotlib/3.0.0-foss-2018b-Python-3.6.6
module load h5py/2.8.0-foss-2018b-Python-3.6.6
module load Keras/2.2.4-foss-2018b-Python-3.6.6

for numpy it is included already on the main Python module. Then you can proceed to create a virtualenv and install mymodule by doing:

mkdir ~/my_venv
virtualenv --system-site-packages ~/my_venv
source ~/my_venv/bin/activate
pip install mymodule

the --system-site-packages flag will make available inside the virtualenv the python modules already loaded. Then, when you pip install mymodule you avoid pulling the wheels for all those and you will be using the optimally compiled versions we provide.

R

If you run a script that depends on specific libraries/packages, those libraries need to be installed. A command like

library(doParallel)

will load the doParallel library if it is installed. If that library is not installed, then R will fail with the following explicit error mesage: Error in library(doParallel) : there is no package called 'doParallel'

If you write:

install.packages("doParallel")
library(doParallel)

then R will try to install the doParallel library before loading it.

On a HPC cluster, you will find that some libraries are installed beforehand by the system administrators. Note that libraries depends on the version of R so it could be that, on the same cluster, a library is available for one version and not for another version. For instance on Hmem, the doParallel library is not installed in R 2.13.1

dfr@hmem00:~ $ R

R version 2.13.1 (2011-07-08)
[...]
> library(doParallel)
Error in library(doParallel) : there is no package called 'doParallel'
>

but it is in R 3.3.1

dfr@hmem00:~ $ R

R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
[...]
> library(doParallel)
Loading required package: foreach
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
Loading required package: iterators
Loading required package: parallel
>

Remember that you can choose the R version through environment modules and the module command.

If the library you need is not installed, you can either ask the system administrators to install it globally or install it by yourself for your account.

If you run the install.packages() command by yourself, R will note that you are not the administrator and ask whether it should create a private library where additional packages will be installed:

> install.packages('doParallel')
Warning in install.packages("doParallel") :
  'lib = "/opt/cecisw/arch/easybuild/2018b/software/R/3.5.1-foss-2018b/lib64/R/library"' is not writable   # This is because you do not have administrator access to the cluster
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library~/R/x86_64-pc-linux-gnu-library/3.5# So R will create a directory in your home directory, one per R version.
to install packages into? (yes/No/cancel) yes
--- Please select a CRAN mirror for use in this session ---
Secure CRAN mirrors

 1: 0-Cloud [https]                   2: Algeria [https]
 3: Australia (Canberra) [https]      4: Australia (Melbourne 1) [https]
 5: Australia (Melbourne 2) [https]   6: Australia (Perth) [https]
 7: Austria [https]                   8: Belgium (Ghent) [https]
[...]
** testing if installed package can be loaded
* DONE (doParallel)

Once this is done, you will be able to load the library on all compute nodes of that cluster for the same R version as the one that was used to install the package.

Note that this process is interactive as R asks the user some questions, and this might fail when running inside a batch job. So you can run it interactively when connected to the frontend (the login node) once before you submit your jobs.

If you want to include the install.packages() command in your scripts, you should specify at least the target directory with the lib parameter and the mirror to use with the repo parameter:

install.packages('doParallel', lib='~/R/x86_64-pc-linux-gnu-library/3.5', repo='https://lib.ugent.be/CRAN')

so that R knows readily the answer to the questions.

As a final note, please be aware that on heterogeneous clusters (which have compute nodes with different generations of processors), the R installation is performed once per generation (to be optimised for the processors of that generation) and it could be that one library is missing for one R version on one generation of CPU.

Octave

With Octave you can use pkg prefix.

Julia

With Julia you can use the pkg package:

julia> using Pkg
julia> Pkg.add("DataFrames")

Perl

For Perl can be used the local::lib module and the cpanm tool:

For cpanm to install modules locally, you need to setup the environment according to the output of the perl -Mlocal::lib command. You can set it interactively with

eval $(perl -Mlocal::lib)

and/or set it once and for all with

perl -Mlocal::lib >> ~/.bash_profile

Then you can simply run

cpanm Algorithm::Numerical::Shuffle

to install the Algorithm::Numerical::Shuffle module.

By default, this will install the modules in the ~/perl5 directory. Should you want to install them to another place, give that path as argument to the local::lib module. For instance:

perl -Mlocal::lib=mylibs/perl >> ~/.bash_profile

to install in the mylibs/perl directory.

Installing with Yum or Aptitude

Installing with Yum (RedHat, Fedora, etc.) or Aptitude (Ubuntu, Debian, etc.) or any other packager manager is not possible for users. Things like sudo apt-get install <name of package> will fail because all clusters run the CentOS distribution that does not use the Aptitude packager manager, and users are not allowed to use sudo (See below).

If your program can only be installed with apt-get, then you will need to use Singularity.

Use of the sudo command

Do not try to use the sudo command; it will fail. Only local system administrators are able to gain root-level privileges. Regular users are not allowed to, simply because they would continuously break each other’s configuration, or potentially destroy the whole system. There is therefore no way root-level privileges will ever be granted to users.