Analysis of ChIP-seq Data

Author

Hugo Tavares

Published

June 17, 2024

Overview

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a method used to identify binding sites for transcription factors, histone modifications and other DNA-binding proteins across the genome. These materials cover the fundamentals of ChIP-seq data analysis, from raw data processing to downstream applications.
We will start with an introduction to ChIP-seq methods, including important considerations when designing your experiments. We will cover the bioinformatic steps in a standard ChIP-seq analysis workflow, covering raw data quality control, trimming/filtering, mapping, duplicate removal, post-mapping quality control, peak calling and peak annotation. We will discuss metrics used for quality assessment of the called peaks when multiple replicates are available, as well as the analysis of differential binding across sample groups. Finally, we will also cover tools and packages that can be used for visualising and exploring your results.

Learning Objectives

Describe how ChIP-seq data is generated and what information it provides about the (epi)genome
Recall the experimental design considerations that are needed when performing ChIP-seq experiments
Understand the bioinformatic steps involved in processing ChIP-seq data
Interpret and assess the quality of your data and results
Perform differential binding analysis to compare different groups of samples

Target Audience

This course is aimed at researchers with no prior experience in the analysis of ChIP-seq data, who would like to get started in processing their data using a standardised pipeline and perform downstream analysis and visualisation of their results.

Prerequisites

Basic understanding of high-throughput sequencing technologies.
- Watch this iBiology video for an excellent overview.
A working knowledge of the UNIX command line (course registration page).
- If you are not able to attend this prerequisite course, please work through our Unix command line materials ahead of the course (up to section 7).
A working knowledge of R (course registration page).
- If you are not able to attend this prerequisite course, please work through our R materials ahead of the course.

Authors

About the authors (alphabetical by surname):

Sandra Cortijo
Affiliation: Centre National de la Recherche Scientifique: Montpellier
Roles: writing; conceptualisation; coding
Sergio Martinez Cuesta
Affiliation: AstraZeneca, Cambridge
Roles: writing; conceptualisation; coding
Sankari Nagarajan Affiliation: University of Manchester
Roles: writing; conceptualisation
Ashley Sawle
Affiliation: Cancer Research UK, Cambridge Institute
Roles: writing; conceptualisation; coding
Denis Seyres Affiliation: Universitätsspital Basel: Basel
Roles: writing; conceptualisation; coding
Hugo Tavares
Affiliation: Bioinformatics Training Facility, University of Cambridge
Roles: writing; conceptualisation; coding

Citation

Please cite these materials if:

You adapted or used any of them in your own teaching.
These materials were useful for your research work. For example, you can cite us in the methods section of your paper: “We carried our analyses based on the recommendations in Cortijo S et al. (2023).”.

You can cite these materials as:

Cortijo S, Martinez Cuesta S, Nagarajan S, Sawle A, Seyres D, Tavares H (2023) “cambiotraining/chipseq: Analysis of ChIP-seq Data”, https://cambiotraining.github.io/chipseq/

Or in BibTeX format:

@Misc{,
  author = {Cortijo, Sandra AND Martinez Cuesta, Sergio AND Nagarajan, Sankari AND Sawle, Ashley AND Seyres, Denis AND Tavares, Hugo},
  title = {cambiotraining/chipseq: Analysis of ChIP-seq Data},
  month = {July},
  year = {2023},
  url = {https://cambiotraining.github.io/chipseq/}
}

Acknowledgements

There are many online resources that inspired our own materials (e.g. package vignettes) and we cite them where relevant.

We also recommend the following training materials:

Understanding chromatin biology using high throughput sequencing from the Harvard Chan Bioinformatics Core
Introduction to ChIPseq using HPC from the Harvard Chan Bioinformatics Core
ChIP-seq analysis from the Babraham Institute