7 Software Management
- Use the
module
tool to search for and load pre-installed software. - Understand what a package manager is, and how it can be used to manage software instalation on a HPC environment.
- Install the Conda package manager. (optional)
- Create a software environment and install software using Conda. (optional)
7.1 Using Pre-installed Software
It is very often the case that HPC admins have pre-installed several software packages that are regularly used by their users. Because there can be a large number of packages (and often different versions of the same program), you need to load the programs you want to use in your script using the module
tool.
The following table summarises the most common commands for this tool:
Command | Description |
---|---|
module avail |
List all available packages. |
module avail -a -i "pattern" or module avail 2>&1 | grep -i "pattern" |
Search the available package list that matches “pattern”. Note the second option is given as some versions of module do not support case-insensitive search (-i option). |
module load <program> |
Load the program and make it available for use. |
module unload <program> |
Unload the program (removes it from your PATH). |
If a package is not available through the module
command, you can contact the HPC admin and ask them to install it for you. Alternatively, you can use a package manager as we show in the next section.
The packages you have available will depend on the group you’re in. Your group affects your $MODULEPATH
which in turn gives you access to various programme-specific packages.
For the rest of the course we’ll be indexing and aligning a small drosophila genome, which requires some genomic tools. The Cancer Ageing and Somatic Mutation programme (CASM) has the tool we need already loaded, so we need to temporarily add their module directory to our module paths. To do so, simply run the following command from your hpc_workshop
folder. Note this is just a temporary workaround, once you are added to your programme’s group, you’ll automatically have access to relevant modules which you can check with module avail
.
source configure_modules
You can check your module path by entering echo $MODULEPATH
to see if the Cancer Genome Project directory has been added to your module path. Or, you can run module avail
to see which module groups you now have access to.
If you log off the farm during this course, you’ll need to rerun this export
one-liner again!
7.1.1 Exercise
7.2 Example: sequence read alignment
Once your sequencing data have been processed by SequenceSpace and/or your programme’s IT team, you can then begin to analyze them in lustre. First, you need to transfer them from the “read-only” NFS or iRODS sections of the Farm to your lustre workspace.
We won’t be doing this today (this is often a programme-specific process that requires unique permissions and some extra training), but it’s important to keep this larger structure of data workflows in mind. When discussing what you need to get started in your group, be sure to ask about how to get added to the group’s permissions list and how they usually access sequencing data (iRODS, Canapps, NFS, etc.)
For our genome alignment Exercise 2, we’ll pretend that you have already staged the raw data files from long term storage (iRODS or NFS) into your lustre storage location in a folder called data
within hpc_workshop
.
7.2.1 Exercise
7.3 Package managers
Often you may want to use software packages that are not be installed by default on the HPC. There are several ways you could manage your own software installation, one of the most popular ones being the use of the package manager Conda or its newer implementation Mamba.
Covering Conda/Mamba is out of the scope of these materials, but check out our course Reproducible and scalable bioinformatics: managing software and pipelines to learn more about this topic.
If you are familiar with Conda/Mamba, you may know that to activate a software environment you use the command mamba activate
. However, to load environments in a shell script that is being submitted to LSF you need to first source a configuration file from Conda/Mamba. For example, let’s say we had an environment called datasci
; to activate it in our LSF script, we would need the following syntax:
# Always add these two commands to your scripts
eval "$(conda shell.bash hook)"
source $CONDA_PREFIX/etc/profile.d/mamba.sh
# then you can activate the environment
mamba activate datasci
This is because when we submit jobs to LSF the jobs will start in a non-interactive shell, and mamba
doesn’t get automatically set. Running the source
command shown will ensure mamba activate
becomes available.
Although Mamba is a great tool to manage your own software installation, the disadvantage is that the software is not compiled specifically taking into account the hardware of the HPC. This is a slightly technical topic, but the main practical consequence is that software installed by HPC admins and made available through the module
system may sometimes run faster than software installed via conda
. This means you will use fewer resources and your jobs will complete faster.
7.4 Summary
- The
module
tool can be used to search for and load pre-installed software packages on a HPC. This tool may not always be available on your HPC.module avail
is used to list the available software.module load PACKAGE_NAME
is used to load the package.
- To install your own software, you can use the Conda package manager.
- Conda allows you to have separate “software environments”, where multiple package versions can co-exist on your system.
- Use
conda env create <ENV>
to create a new software environment andconda install -n <ENV> <PROGRAM>
to install a program on that environment. - Use
conda activate <ENV>
to “activate” the software environment and make all the programs installed there available.- When submitting jobs to
bsub
, always remember to includesource $CONDA_PREFIX/etc/profile.d/conda.sh
at the start of the shell script, followed by theconda activate
command.
- When submitting jobs to
- Always remember to include either
module load
orconda activate
in your submission script.