3  Background

All traditional statistical test make use of various named distributions or require certain assumptions to be made about the parent distribution (such as the shape of the distribution is symmetric for non-parametric tests like Wilcoxon) in order work properly. For parametric tests like the t-test or ANOVA this is the normal distribution. If these assumptions are met then traditional statistical tests are fine, but what can we do when we can’t assume normality or if the distribution of the data is just weird?

Resampling techniques are the tools that work here. They can allow us to test hypotheses about our data using only the data itself (without appeal to any assumptions about the shape or form of the parent distribution). They are in some ways a much simpler approach to statistics, but because they rely on the ability to generate thousands and tens of thousands of random numbers very quickly, they simply weren’t considered practical back in the day. Even now, they aren’t widely used because they require the user (you, in case you’d forgotten what’s going on at this time of day) to do more than click a button on a stats package or even know the name of the test. These techniques require a mix of statistical knowledge and programming; a combination of skills that isn’t all that common! There are three broad areas of resampling methods (although they are all quite closely related):

  1. Permutation methods
  2. Bootstrapping
  3. Cross-validation

3.1 Permutation methods

Permutation methods resample the original data, assuming the null hypothesis. They allow us to perform an exact statistical hypothesis test. For example, if we have data that compare the response of a drug between a control and treatment group, our null hypothesis would be that there is no difference between control and treatment.

What that means is that, if this null hypothesis is true, we should be able to randomly reshuffle or assign the control or treatment labels to the various measurements. If there really isn’t a difference between the two groups then this should not lead to any changes compared to the observed difference in response.

We can do this many, many times. Let’s say we do this 1,000 times. We can then calculate how often the permuted differences exceed the observed difference, which gives us an exact p-value: the probability of observing our data, given the null hypothesis!

3.2 Bootstrapping

Bootstrapping is a technique for estimating confidence intervals for parameter estimates. We effectively treat our data as if it was the parent distribution, draw samples from it and calculate the statistic of choice (the mean usually) using these sub-samples. If we repeat this process many times, we will eventually be able to construct a distribution for our sample statistic. This can be used to give us a confidence interval for our statistic.

3.3 Cross-validation

Cross-validation is at the heart of modern machine learning approaches but existed long before this technique became sexy/fashionable. You divide your data up into two sets: a training set that you use to fit your model and a testing set that you use to evaluate your model. This allows your model accuracy to be empirically measured. There are several variants of this technique (holdout, k-fold cross validation, leave-one-out-cross-validation (LOOCV), leave-p-out-cross-validation (LpOCV) etc.), all of which do essentially the same thing; the main difference between them being a trade-off between the amount of time it takes to perform versus the reliability of the method.

We’ll start with permutations techniques and cover bootstrapping and cross-validation in subsequent practicals.