2 Data & Setup
Data
The data used in these materials is provided as a zip file. Download and unzip the folder to your Desktop to follow along with the materials.
Setup
To run the analysis covered in this workshop, you will broadly need two things:
- R/RStudio for all the downstream analysis (i.e. after peak calling using the
nf-core/chipseq
workflow). These analyses can typically be run on your local computer and on any OS (macOS, Windows, Linux). - A Linux environment to run the pre-processing steps and peak calling (i.e. running the
nf-core/chipseq
workflow). We highly recommend using a dedicated server (typically a HPC) for this step. Technically, you can also run this workflow on Windows via WSL2 (we provide instructions below), but we do not recommend it for production runs.
R and RStudio
- Go to the R installation folder and look at the instructions for your distribution.
- Download the RStudio installer for your distribution and install it using your package manager.
R Packages
Open RStudio and run the following:
# install BiocManager if not installed already
if (!require("BiocManager", quietly = TRUE)){
install.packages("BiocManager")
}
# Install all packages used
::install(c("GenomicRanges", "rtracklayer", "plyranges", "ChIPseeker", "profileplyr", "ggplot2", "DiffBind")) BiocManager
Conda/Mamba
For the command-line tools covered in the course you will need a Linux machine (or WSL2, if you are on Windows - see Section 2.2.6).
If you are an experienced Linux user, you can install/compile each tool individually using your preferred method. Otherwise, we recommend doing it via the Mamba package manager. If you already use Conda/Mamba you can skip this step.
To make a fresh install of Mamba, you can run:
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh
And follow the instructions on the terminal, accepting the defaults. Make sure to restart your terminal after the installation completes.
These instructions also work if you’re using a HPC server.
Nextflow
We recommend having a dedicated environment for Nextflow, which you can use across multiple pipelines you use in the future. Assuming you’ve already installed Conda/Mamba, open your terminal and run:
mamba create --name nextflow nextflow
Whenever you want to use nextflow, you need to activate your environment with conda activate nextflow
.
ChIP-seq tools
For other command-line tools that we covered in the workshop, you can install them in their own conda environment:
mamba create --name chipseq
mamba install --name chipseq idr deeptools meme homer
When you want to use any of them, make sure to activate your environment first with conda activate chipseq
.
Windows WSL2
The Windows Subsystem for Linux (WSL2) runs a compiled version of Ubuntu natively on Windows. There are detailed instructions on how to install WSL on the Microsoft documentation page. Briefly:
- Click the Windows key and search for Windows PowerShell, open it and run the command:
wsl --install
. - Restart your computer.
- Click the Windows key and search for Ubuntu, which should open a new terminal.
- Follow the instructions to create a username and password (you can use the same username and password that you have on Windows, or a different one - it’s your choice).
- You should now have access to a Ubuntu Linux terminal. This (mostly) behaves like a regular Ubuntu terminal, and you can install apps using the
sudo apt install
command as usual.
Setup directories
After WSL is installed, it is useful to create shortcuts to your files on Windows. Your C:\
drive is located in /mnt/c/
(equally, other drives will be available based on their letter). For example, your desktop will be located in: /mnt/c/Users/<WINDOWS USERNAME>/Desktop/
. It may be convenient to set shortcuts to commonly-used directories, which you can do using symbolic links, for example:
- Documents:
ln -s /mnt/c/Users/<WINDOWS USERNAME>/Documents/ ~/Documents
- If you use OneDrive to save your documents, use:
ln -s /mnt/c/Users/<WINDOWS USERNAME>/OneDrive/Documents/ ~/Documents
- If you use OneDrive to save your documents, use:
- Desktop:
ln -s /mnt/c/Users/<WINDOWS USERNAME>/Desktop/ ~/Desktop
- Downloads:
ln -s /mnt/c/Users/<WINDOWS USERNAME>/Downloads/ ~/Downloads
Docker for WSL
We’ve experienced issues in the past when running Nextflow pipelines from WSL2 with -profile singularity
. As an alternative, you can instead use Docker, which is another software containerisation solution. To set this up, you can follow the instructions given on the Microsoft Documentation: Get started with Docker remote containers on WSL 2.
Once you have Docker set and installed, you can then use -profile docker
when running your Nextflow command.
Singularity
Singularity is a software for running a virtual operating system locally (known as a container) and popularly used for complex bioinformatic pipelines. Nextflow supports the use of Singularity for managing its software and we recommend its use it on HPC servers. Singularity is typically installed by your HPC admins, otherwise request that they do so.
However, if you want to run the analysis locally on your computer (again, we do not recommend you to do so), then you can install Singularity following the instructions below.
You can use Singularity from the Windows Subsystem for Linux (see Section 2.2.6). Once you setup WSL, you can follow the instructions for Linux.
Singularity is not available for macOS.
These instructions are for Ubuntu or Debian-based distributions1.
sudo apt update && sudo apt upgrade && sudo apt install runc
codename=$(lsb_release -c | sed 's/Codename:\t//')
wget -O singularity.deb https://github.com/sylabs/singularity/releases/download/v3.10.2/singularity-ce_3.11.4-${codename}_amd64.deb
sudo dpkg -i singularity.deb
rm singularity.deb
See the Singularity documentation page for other distributions.↩︎