Conda

The general guideline for Anaconda is: Please do not use it on the clusters. It is meant for easy installations on a single-user laptop and is not well suited for multi-user clusters. It sometimes makes wrong assumptions about the OS configuration, generates a huge number of files (~100.000), etc. Conda is great to easily test the latest versions of some packages. Still, it’s not meant to keep several versions of the same package on the system, which is needed for scientific reproducibility/provenance so you must install multiple environments for multiple version, adding more (duplicate) files to the system. But, more importantly, Anaconda distributes pre-compiled binaries in many cases, and otherwise uses pre-configured compilation options which are often not optimal for the CPUs we have on the clusters. Even if the situation is not as bad now that Conda ships with MKL, very few wheels are optimised. Furhtermore, pre-compiled binaries might need versions of the system libraries more recent than what the clusters can offer, which can lead to the typical ‘Symbol not found’ error at runtime.

More information:

Conda in a container

If you are in a position where Conda is the only option, then it is best to package it into a container. This way, it will

  • be stored as a single, large, file that is better handled by the cluster filesystem
  • be transferable to another cluster or user or to archive
  • be possible to have multiple versions of the same conda-installed software
  • be packaged with the necessary libraries, possibly more recent than the cluster can offer.

Note

To speedup the image building, you can specify a set of environment variables that will instruct Apptainer to use fast in-memory filesystems for its intermediate files with:

$ export APPTAINER_TMPDIR=$XDG_RUNTIME_DIR
$ export APPTAINER_CACHEDIR=$XDG_RUNTIME_DIR

Pre-built images

One option is to use a pre-built image from a repository. This is the easiest solution but:

  • you need to make sure it contains all the packages you need as pre-built images are read-only and cannot be modified ;
  • you need to make sure the source from which you obtain de image is reputable and does not contain malware.

So it really is an option only if the package you need to use is distributed through an Apptainer image in addition to a Conda package.

It could be possible to use the conda commad inside the container to install conda packages outside of the container, but that would defeat the purpose of using a container.

Command

For instance, with the docker://continuumio/miniconda3 image from the Docker image repository:

$ apptainer pull docker://continuumio/miniconda3
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 9548983a4b0b done   |
Copying blob 04e7578caeaa done   |
Copying blob c893f9326bb9 done   |
Copying config 1f702ef68a done   |
Writing manifest to image destination
2024/04/29 15:52:55  info unpack layer: sha256:04e7578caeaa5a84ad5d534aabbb307a37c85f9444b94949d37544a1c69c8f15
2024/04/29 15:52:56  info unpack layer: sha256:c893f9326bb9508af1010564ed7e77b867db6c8e519903bfe99877ff8e376da9
2024/04/29 15:52:57  info unpack layer: sha256:9548983a4b0b9f7022bd4c20cd827257acb430f31e5297d9bfa986fab178c7cc
INFO:    Creating SIF file...
$ apptainer exec miniconda3_latest.sif python
Python 3.12.2 | packaged by Anaconda, Inc. | (main, Feb 27 2024, 17:35:02) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

The above gives you a Python interpreted with the Conda packages installed in the image.

How to adapt an image by hand

There are two main ways to adapt an image to add missing packages:

  • either create a new image via a writable sandbox; or
  • create a writable overlay.

They will be demonstrated with the numpy package. Note that that package is used as an example because it does not have a lot of dependencies, but that package is installed, and optimised on all the CÉCI clusters, in the SciPy-bundle package.

Option1: Create a writable sandbox

Step 1. Create a sandbox directory from the image with apptainer build.

$ apptainer build --sandbox miniconda3.sandbox miniconda3_latest.sif
INFO:    Starting build...
INFO:    Verifying bootstrap image miniconda3_latest.sif
INFO:    Creating sandbox directory...
INFO:    Build complete: miniconda3.sandbox

This gives us a directory with all the files of the image. This directory is writable, and can be used as an Apptainer image directly, rather than a .sif. file. So we can modify that sandbox directory to install Numpy by starting a shell in it and then run the appropriate conda command.

$ apptainer shell --writable miniconda3.sandbox
Apptainer> conda install numpy
Channels:
- defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

environment location: /opt/conda

added / updated specs:
    - numpy


The following NEW packages will be INSTALLED:

blas               pkgs/main/linux-64::blas-1.0-mkl
intel-openmp       pkgs/main/linux-64::intel-openmp-2023.1.0-hdb19cb5_46306
mkl                pkgs/main/linux-64::mkl-2023.1.0-h213fc3f_46344
mkl-service        pkgs/main/linux-64::mkl-service-2.4.0-py312h5eee18b_1
mkl_fft            pkgs/main/linux-64::mkl_fft-1.3.8-py312h5eee18b_0
mkl_random         pkgs/main/linux-64::mkl_random-1.2.4-py312hdb19cb5_0
numpy              pkgs/main/linux-64::numpy-1.26.4-py312hc5e2394_0
numpy-base         pkgs/main/linux-64::numpy-base-1.26.4-py312h0da6c21_0
tbb                pkgs/main/linux-64::tbb-2021.8.0-hdb19cb5_0


Proceed ([y]/n)? y


Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Now that we have Numpy installed in the diretcory, we build a new Apptainer image from that sandbox directory and delete the sandbox

$ apptainer build numpy.sif miniconda3.sandbox
INFO:    Starting build...
INFO:    Creating SIF file...
INFO:    Build complete: numpy.sif
$ rm -rf miniconda.sandbox

Note that the base image miniconda3_lastest.sif was left unmodified.

We then can run the Python interpreter and check that Numpy is available in it.

$ apptainer exec numpy.sif python
Python 3.12.2 | packaged by Anaconda, Inc. | (main, Feb 27 2024, 17:35:02) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>>
$

The other option is to use an overlay.

Option2: Add an overlay

A persistent overlay is a writable image that can be “combined” with the base image. When you create new files in the container, the overlay will store the changes.

First create an overlay like this (for a 10G file):

$ apptainer overlay create --size 10240 ext3_overlay.img

Attach the overlay with the --overlay option of apptainer and install Numpy in it:

$ apptainer exec --overlay overlay.img:rw miniconda3_latest.sif bash
Apptainer> conda install numpy
Retrieving notices: ...working... done
Channels:
- defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

environment location: /opt/conda

added / updated specs:
    - numpy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    mkl-service-2.4.0          |  py312h5eee18b_1          66 KB
    mkl_fft-1.3.8              |  py312h5eee18b_0         204 KB
    mkl_random-1.2.4           |  py312hdb19cb5_0         284 KB
    numpy-1.26.4               |  py312hc5e2394_0          11 KB
    numpy-base-1.26.4          |  py312h0da6c21_0         7.7 MB
    ------------------------------------------------------------
                                        Total:         8.2 MB

The following NEW packages will be INSTALLED:

blas               pkgs/main/linux-64::blas-1.0-mkl
intel-openmp       pkgs/main/linux-64::intel-openmp-2023.1.0-hdb19cb5_46306
mkl                pkgs/main/linux-64::mkl-2023.1.0-h213fc3f_46344
mkl-service        pkgs/main/linux-64::mkl-service-2.4.0-py312h5eee18b_1
mkl_fft            pkgs/main/linux-64::mkl_fft-1.3.8-py312h5eee18b_0
mkl_random         pkgs/main/linux-64::mkl_random-1.2.4-py312hdb19cb5_0
numpy              pkgs/main/linux-64::numpy-1.26.4-py312hc5e2394_0
numpy-base         pkgs/main/linux-64::numpy-base-1.26.4-py312h0da6c21_0
tbb                pkgs/main/linux-64::tbb-2021.8.0-hdb19cb5_0


Proceed ([y]/n)? y


Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Now you can start Python using both containers images, the base image and the overlay:

$ apptainer exec --overlay overlay.img:ro miniconda3_latest.sif python
Python 3.12.2 | packaged by Anaconda, Inc. | (main, Feb 27 2024, 17:35:02) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>>
$

The base image miniconda3_lastest.sif was left unmodified.

Use a automating tool: cotainr

The cotainr tool was created to “have an easy way to build […] containers in user space, i.e. without root or fakeroot that is generally required for building a container”.

It requires less steps than the procedures outlined above, but

$ cat numpy.yml
name: base
channels:
- defaults
dependencies:
- numpy

The above file is a minimal Conda environment definition file that installs Numpy. Note this file can be generated from an existing conda environment with packages installed manually over time. See next section for details.

Here, make sure to start from a non-conda-installed image as it will first start by installing Conda itself in the image. The example below uses the docker://ubuntu:22.04 image from the Docker repository.

[dfr@lm4-f001 ~]$ cotainr-2023.11.0/bin/cotainr build --accept-license --base-image docker://ubuntu:22.04 --conda-env numpy.yml numpy2.sif
Cotainr:-: Creating Singularity Sandbox
Cotainr:-: Installing Conda environment: /home/users/d/f/dfr/numpy.yml
CondaInstall.err:-: You have accepted the Miniforge installer license via the command line option '--accept-licenses'.
Cotainr:-: Cleaning up unused Conda files
Cotainr:-: Finished installing conda environment: /home/users/d/f/dfr/numpy.yml
Cotainr:-: Adding metadata to container
Cotainr:-: Building container image
Cotainr:-: Finished building /home/users/d/f/dfr/numpy2.sif in 00:00:32

Then start the Python interpreter

$ apptainer exec numpy2.sif python
Python 3.12.3 | packaged by Anaconda, Inc. | (main, Apr 19 2024, 16:50:38) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>>
$

and check that Numpy is available in that image.

Create an environment definition file

Conda packages can be installed with environment definition files (see the Conda documentation about such files):

channels:
  - defaults
  - conda-forge
dependencies:
  - matplotlib
  - python=3.9
  - pip
  - pip:
    - vivarium

Such files can be created from an existing environment with the env export function from Conda:

$ conda activate your_env
$ conda env export > environment.yml

How to use the image

Once you are happy with Conda environment inside the image, you can use it to either start the Python interpreter, or any other executable that was installed with Conda.

This is done with the aptainer exec command, without the need to “activate” the environment as you would do with a regular Conda environment. So if your submission script looked like this:

#!/bin/bash
#
#SBATCH --some-option
#SBATCH --some-other-option

# Activation of the Conda env
conda activate mycondaenv

python mypythonscript.py

it should now look like this, assuming you created an image named mycondaenv.sif:

#!/bin/bash
#
#SBATCH --some-option
#SBATCH --some-other-option

apptainer exec mycondaenv.sif python mypythonscript.py

At this point, if everything is working, you can delete your Conda install on the cluster.