1
About the course
1.1
Overview
1.2
Registration
1.3
Prerequisites
1.4
Github
1.5
License
1.6
Contact
1.7
Colophon
2
Introduction
2.1
What is machine learning?
2.2
Aspects of ML
2.3
What actually happened under the hood
3
Dimensionality reduction
3.1
Linear Dimensionality Reduction
3.1.1
Interpreting the Principle Component Axes
3.1.2
Horseshoe effect
3.1.3
PCA analysis of mammalian development
3.2
Exercise 2.3.
3.3
Nonlinear Dimensionality Reduction
3.3.1
Stochasticity
3.3.2
Analysis of mammalian development
3.4
Other dimensionality reduction techniques
4
Clustering
4.1
Introduction
4.2
Distance metrics
4.3
Hierarchic agglomerative
4.3.1
Linkage algorithms
4.4
K-means
4.4.1
Algorithm
4.4.2
Choosing initial cluster centres
4.4.3
Choosing k
4.5
DBSCAN
4.5.1
Algorithm
4.5.2
Implementation in R
4.5.3
Choosing parameters
4.6
Example: clustering synthetic data sets
4.6.1
Hierarchic agglomerative
4.6.2
K-means
4.6.3
DBSCAN
4.7
Evaluating cluster quality
4.7.1
Silhouette method
4.7.2
Example - k-means clustering of blobs data set
4.7.3
Example - DBSCAN clustering of noisy moons
4.8
Example: gene expression profiling of human tissues
4.8.1
Hierarchic agglomerative
4.8.2
K-means
4.8.3
DBSCAN
4.9
Exercises
4.9.1
Exercise 1
5
Nearest neighbours
5.1
Introduction
5.2
Classification: simulated data
5.2.1
knn function
5.2.2
Plotting decision boundaries
5.2.3
Bias-variance tradeoff
5.2.4
Choosing
k
5.3
Example on the Iris dataset
5.4
Classification: cell segmentation
5.4.1
Cell segmentation data set
5.4.2
Data splitting
5.4.3
Identification of data quality issues
5.4.4
Fit model
5.5
Regression
5.5.1
Partition data
5.5.2
Data pre-processing
5.5.3
Search for optimum
k
5.5.4
Use model to make predictions
5.6
Exercises
5.6.1
Exercise 1
6
Support vector machines
6.1
Introduction
6.1.1
Maximum margin classifier
6.2
Support vector classifier
6.3
Support Vector Machine
6.4
Example - training a classifier
6.4.1
Setup environment
6.4.2
Partition data
6.4.3
Visualize training data
6.4.4
Define a custom model
6.4.5
Model cross-validation and tuning
6.4.6
Prediction performance measures
6.4.7
Plot decision boundary
6.5
Example - regression
6.6
Further reading
6.7
Exercises
6.7.1
Exercise 1
7
Decision trees and random forests
7.1
Decision Trees
7.2
Random Forest
7.3
Exercises
8
Use case 1
8.1
Introduction
8.2
Problem: automated detection of malaria
8.3
Challenges
8.4
Getting started
8.4.1
Load data
8.4.2
Model comparison
8.5
Solutions
9
Linear models and matrix algebra
9.1
Linear models
9.2
Matrix algebra
10
Linear regression and logistic regression
10.1
Regression
10.1.1
Linear regression
10.1.2
Polynomial regression
10.1.3
Distributions of fits
10.2
Classification
10.2.1
Logistic regression
10.3
Resources
11
Artificial neural networks
11.1
Neural Networks
12
Use case 2
13
Deep Learning
13.1
Multilayer Neural Networks
13.1.1
Reading in images
13.1.2
Constructing layers in kerasR
13.1.3
Rick and Morty classifier using Deep Learning
13.2
Convolutional neural networks
13.2.1
Checking the models
13.2.2
Asking more precise questions
13.2.3
More complex networks
13.3
Further reading
Appendix
A
Resources
A.1
Python
A.2
Machine learning data set repositories
A.2.1
MLDATA
A.2.2
UCI Machine Learning Repository
B
Solutions ch. 2 - Dimensionality reduction
B.1
Exercise 2.5.
B.2
Exercise 2.6.
C
Solutions ch. 4 - Clustering
C.1
Exercise 1
D
Solutions ch. 7 - Nearest neighbours
D.1
Exercise 1
E
Solutions ch. 6 - Support vector machines
E.1
Exercise 1
F
Solutions ch. 9 - Decision trees and random forests
F.1
Exercise 1
G
Solutions chapter 8 - use case 1
G.1
Preparation
G.1.1
Load required libraries
G.1.2
Define SVM model
G.1.3
Setup parallel processing
G.1.4
Load data
G.2
Assess data quality
G.2.1
Zero and near-zero variance predictors
G.2.2
Are all predictors on the same scale?
G.2.3
Redundancy from correlated variables
G.2.4
Skewness
G.3
Infection status (two-class problem)
G.3.1
Model training and parameter tuning
G.3.2
KNN
G.3.3
SVM
G.3.4
Decision tree
G.3.5
Random forest
G.3.6
Compare models
G.3.7
Predict test set using our best model
G.3.8
ROC curve
G.4
Discrimination of infective stages (multi-class problem)
G.4.1
Define cross-validation procedure
G.4.2
KNN
G.4.3
SVM
G.4.4
Decision tree
G.4.5
Random forest
G.4.6
Compare models
G.4.7
Predict test set using our best model
H
Solutions ch. 3 - Linear models and matrix algebra
H.1
Example 2
H.2
Example 2
I
Solutions ch. 4 - Linear and non-linear (logistic) regression
J
Solutions ch. 10 - Artificial neural networks
J.1
Exercise 1
K
Solutions for use case 2
An Introduction to Machine Learning
12
Use case 2