Installing software by yourself

Installing languages extensions

With scripting languages, even if the interpreter is installed globally, you have the possibility to install additional packages locally in your home directory.

Python

First of all, we discourage the use of Conda on the CÉCI clusters. See Conda .

For python include the --user option to pip install, e.g.:

$ pip install --user mymodule

will install the fictitious mymodule package in your home directory $HOME/.local/. Once that is done, you will need to make sure $HOME/.local/bin is in your PATH variable.

Make sure to replace the x.y part with the actual version of Python you are using. For instance:

$ export PATH=$HOME/.local/bin:$PATH

It is important when you install a package that you load the correct Python module, and use the Pip option --no-binary :all: to recompile from source rather than install pre-compiled binaries whenever possible. See more information in the PIP documentation . You can use GCC optimisation flags when doing so. Example:

CFLAGS='-O2 -pipe -march=sandybridge' pip install --no-binary :all: PACKAGE

The above example builds the PACKAGE with optimisation options that are compatible with most clusters, and hence suboptimal on recent ones. See With GCC for more information.

Virtualenvs

If you are already used to create Python virtualenvs for managing your custom modules installations (if you are not, is a good idea to learn about them), take into account that on the clusters we provide, apart of different core Python versions, installations of different python modules bundles compatible with them. If you need a specific python module not available in the environment when you do e.g. module load Python/3.6.6-foss-2018b, check always with the module avail command or in the list of installed software if there is a specific installation provided for that Python installation.

In the case that you need mymodule which is not available, check carefully which are its dependencies and verify if some of those are available. Let’s imagine that mymodule has as requirements numpy, matplotlib, h5py and Keras. On some clusters there are specific installations for those, so you can load them

module load Python/3.6.6-foss-2018b
module load matplotlib/3.0.0-foss-2018b-Python-3.6.6
module load h5py/2.8.0-foss-2018b-Python-3.6.6
module load Keras/2.2.4-foss-2018b-Python-3.6.6

for numpy it is included already on the main Python module. Then you can proceed to create a virtualenv and install mymodule by doing:

mkdir ~/my_venv
virtualenv --system-site-packages ~/my_venv
source ~/my_venv/bin/activate
pip install mymodule

the --system-site-packages flag will make available inside the virtualenv the python modules already loaded. Then, when you pip install mymodule you avoid pulling the wheels for all those and you will be using the optimally compiled versions we provide.

R

If you run a script that depends on specific libraries/packages, those libraries need to be installed. A command like

library(doParallel)

will load the doParallel library if it is installed. If that library is not installed, then R will fail with the following explicit error mesage: Error in library(doParallel) : there is no package called 'doParallel'

If you write:

install.packages("doParallel")
library(doParallel)

then R will try to install the doParallel library before loading it.

On a HPC cluster, you will find that some libraries are installed beforehand by the system administrators. Note that libraries depends on the version of R so it could be that, on the same cluster, a library is available for one version and not for another version. For instance on Hmem, the doParallel library is not installed in R 2.13.1

ceciuser@cecicluster:~ $ R

R version 2.13.1 (2011-07-08)
[...]
> library(doParallel)
Error in library(doParallel) : there is no package called 'doParallel'
>

but it is in R 3.3.1

ceciuser@cecicluster:~ $ R

R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
[...]
> library(doParallel)
Loading required package: foreach
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
Loading required package: iterators
Loading required package: parallel
>

Remember that you can choose the R version through environment modules and the module command.

If the library you need is not installed, you can either ask the system administrators to install it globally or install it by yourself for your account.

If you run the install.packages() command by yourself, R will note that you are not the administrator and ask whether it should create a private library where additional packages will be installed:

> install.packages('doParallel')
Warning in install.packages("doParallel") :
  'lib = "/opt/cecisw/arch/easybuild/2018b/software/R/3.5.1-foss-2018b/lib64/R/library"' is not writable   # This is because you do not have administrator access to the cluster
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library~/R/x86_64-pc-linux-gnu-library/3.5# So R will create a directory in your home directory, one per R version.
to install packages into? (yes/No/cancel) yes
--- Please select a CRAN mirror for use in this session ---
Secure CRAN mirrors

 1: 0-Cloud [https]                   2: Algeria [https]
 3: Australia (Canberra) [https]      4: Australia (Melbourne 1) [https]
 5: Australia (Melbourne 2) [https]   6: Australia (Perth) [https]
 7: Austria [https]                   8: Belgium (Ghent) [https]
[...]
** testing if installed package can be loaded
* DONE (doParallel)

Once this is done, you will be able to load the library on all compute nodes of that cluster for the same R version as the one that was used to install the package.

Note that this process is interactive as R asks the user some questions, and this might fail when running inside a batch job. So you can run it interactively when connected to the frontend (the login node) once before you submit your jobs.

If you want to include the install.packages() command in your scripts, you should specify at least the target directory with the lib parameter and the mirror to use with the repo parameter:

install.packages('doParallel', lib='~/R/x86_64-pc-linux-gnu-library/3.5', repo='https://lib.ugent.be/CRAN')

so that R knows readily the answer to the questions.

As a final note, please be aware that on heterogeneous clusters (which have compute nodes with different generations of processors), the R installation is performed once per generation (to be optimised for the processors of that generation) and it could be that one library is missing for one R version on one generation of CPU.

Octave

With Octave you can use the pkg command to install additional packages, and the pkg prefix command to decide where to install them. Here is an example how to install a package by yourself on the cluster. The objective of this mini tutorial is to have the azimuth function available on a specific version of Octave available as an system module. We will assume that the relevant system module is loaded.

First you need to find the name of the package which offers that function. According to its documentation, https://octave.sourceforge.io/mapping/function/azimuth.html, it is named “mapping”. The easy way would be using the Octave Forge from the Octave command line with

octave:1> pkg install -forge "mapping"

but unfortunately, in this example, the version available from the Octave Forge requires a newer version of Octave than we have currently active.

octave:1> pkg install -forge "mapping"
error: the following dependencies were unsatisfied:
   mapping needs octave >= 5.2.0
 mapping needs geometry >= 4.0.0
octave:2> version
ans = 4.4.1

We will then install the package and its dependencies manually. As the latest version is not working for this version, you will need to downloaded the previous version from the project page on Source Forge: https://sourceforge.net/projects/octave/files/Octave%20Forge%20Packages/Individual%20Package%20Releases/

Download the file mapping-1.4.0.tar.gz and copied it to your home directory.

Note

Whenever installing packages in Octave, load the texinfo module before starting Octave to be able to generate the documentation inside Octave.

As stated in the documentation, mapping requires some dependencies. When you try to install the package, with pkg install <path to file>, it complains:

octave:1> pkg install mapping-1.4.0.tar.gz
error: the following dependencies were unsatisfied:
   mapping needs geometry >= 4.0.0
octave:2> pkg install geometry-4.0.0.tar.gz
error: the following dependencies were unsatisfied:
   geometry needs matgeom >= 1.0.0

therefore, download the following files:

  • geometry-4.0.0.tar.gz
  • matgeom-1.2.3.tar.gz

from the same location and proceeded to install matgeom:

octave:1> pkg install matgeom-1.2.3.tar.gz
For information about changes from previous versions of the matgeom package, run 'news matgeom'.

and back the dependency chain:

octave:2> pkg install geometry-4.0.0.tar.gz
warning: doc_cache_create: unusable help text found in file 'clipper'
For information about changes from previous versions of the geometry package, run 'news geometry'.
octave:3> pkg install mapping-1.4.0.tar.gz
configure: WARNING: GDAL library not found.  Reading of raster files will be disabled.
For information about changes from previous versions of the mapping package, run 'news mapping'.

Here you notice the warning about GDAL libraries not being found, leading to reduced functionalities. So you can remove the package and install it again with the GDAL module loaded:

octave:4> pkg uninstall mapping
octave:5> exit
[dfr@lemaitre3 ~]$ ml GDAL
[dfr@lemaitre3 ~]$ octave
octave: X11 DISPLAY environment variable not set
[...]

octave:1> pkg install mapping-1.4.0.tar.gz
For information about changes from previous versions of the mapping package, run 'news mapping'.

The warning has disappeared and the package is available.

octave:2> pkg list
Package Name  | Version | Installation directory
--------------+---------+-----------------------
     control  |   3.1.0 | .../easybuild/2018b/software/Octave/4.4.1-foss-2018b/share/octave/packages/control-3.1.0
    geometry  |   4.0.0 | /home/ucl/pan/dfr/octave/geometry-4.0.0
          io  |  2.4.12 | .../arch/easybuild/2018b/software/Octave/4.4.1-foss-2018b/share/octave/packages/io-2.4.12
     mapping  |   1.4.0 | /home/ucl/pan/dfr/octave/mapping-1.4.0
     matgeom  |   1.2.3 | /home/ucl/pan/dfr/octave/matgeom-1.2.3
      signal  |   1.4.0 | .../easybuild/2018b/software/Octave/4.4.1-foss-2018b/share/octave/packages/signal-1.4.0
  statistics  |   1.4.0 | .../2018b/software/Octave/4.4.1-foss-2018b/share/octave/packages/statistics-1.4.0

Once the package is loaded, the azimuth function is found and behaves as in the examples of its documentation.

octave:3> pkg load mapping
octave:4> which azimuth
'azimuth' is a function from the file /home/ucl/pan/dfr/octave/mapping-1.4.0/azimuth.m
octave:5> azimuth([10,10], [10,40])
ans =  87.336
octave:6> azimuth([0,10], [0,40])
ans =  90
octave:7> azimuth(pi/4,0,pi/4,-pi/2,"radians")
ans =  5.3279
octave:8>

Julia

With Julia you can use the pkg package:

julia> using Pkg
julia> Pkg.add("DataFrames")

Perl

For Perl can be used the local::lib module and the cpanm tool:

For cpanm to install modules locally, you need to setup the environment according to the output of the perl -Mlocal::lib command. You can set it interactively with

eval $(perl -Mlocal::lib)

and/or set it once and for all with

perl -Mlocal::lib >> ~/.bash_profile

Then you can simply run

cpanm Algorithm::Numerical::Shuffle

to install the Algorithm::Numerical::Shuffle module.

By default, this will install the modules in the ~/perl5 directory. Should you want to install them to another place, give that path as argument to the local::lib module. For instance:

perl -Mlocal::lib=mylibs/perl >> ~/.bash_profile

to install in the mylibs/perl directory.

Installing with Yum or Aptitude

Installing with Yum (RedHat, Fedora, etc.) or Aptitude (Ubuntu, Debian, etc.) or any other packager manager is not possible for users. Things like sudo apt-get install <name of package> will fail because all clusters run the CentOS distribution that does not use the Aptitude packager manager, and users are not allowed to use sudo (See below).

If your program can only be installed with apt-get, then you will need to use Singularity.

Use of the sudo command

Do not try to use the sudo command; it will fail. Only local system administrators are able to gain root-level privileges. Regular users are not allowed to, simply because they would continuously break each other’s configuration, or potentially destroy the whole system. There is therefore no way root-level privileges will ever be granted to users.