7  Statistical Power

Published

May 12, 2023

This section of the course revisits the concept of statistical power, and talks more about the practicalities of pilot experiments.

7.1 What is statistical power and power analysis?

Power is defined as the probability of detecting an effect, given that the effect does truly exist in the underlying population of interest. As a general rule, we prefer to see a power of at least 0.8 or 80% in our experiments (and even higher than that is, of course, better).

Power is mathematically linked with 1) significance/alpha threshold, 2) effect size and 3) sample size. In fact, given three of these, it’s possible to calculate the fourth; this is the underlying principle behind a power analysis. Using the pwr package in R, it’s possible to perform both an a priori power analysis (calculating required sample size, for a given effect size and significance level) and a posteriori power analysis (calculating the overall power of a dataset you’ve already collected).

I won’t go into any more detail about the nitty-gritty maths behind errors and power here, since there is already an excellent set of course materials that covers it very comprehensively, linked below in the drop-down box:

To learn more about performing power analysis for yourself, and a more detailed explanation of effect size, you can view course materials here.

7.2 Why do we care about power?

It is a truth universally acknowledged, that a researcher who’s studying a real, scientific effect must be in want of enough statistical power to detect it.

On the face of it, it seems like there’s a very simple solution to make sure that you have enough power: why not just have a massive sample every time? Then, we’re probably going to detect even the smallest effect sizes, and we don’t ever need to worry.

Well, it’s not quite so simple as that.

If you’ve done experiments of your own, you’re probably well aware that they’re expensive and time consuming, and it can be unrealistic to constantly be scaling experiments up (especially without any certainty that the effect of interest is real). In some contexts, such as in animal research, an overly large sample can also be a negative and wasteful thing; or perhaps it simply isn’t feasible, such as if you’re studying a rare disease or condition. A power analysis can help you balance the trade-off between statistical power and feasibility in your research. From the persepective of a funder, or supervisor, it also demonstrates that you’ve thought ahead and planned the experiment carefully, including the resources you’ll need for it.

7.3 Other factors to consider

Aside from a small effect size, there are other reasons why your sample size may need to be increased, depending on your planned analysis.

You’re expecting a high attrition or exclusion rate

This is likely to be outside of your control as a researcher, but is important to consider and account for when planning experiments. Attrition and exclusion are both ways in which your sample can be decreased, once you’ve started collecting data.

Exclusion refers to the deliberate choice of an experimenter to remove observations from the sample, perhaps because the participant fails to meet certain criteria, or the quality of the data is not high enough. For instance, in neuroimaging studies, participants’ data may be removed from the study if they moved too much during the functional scan and caused motion artefacts. (Crucially, these exclusions shouldn’t be related to the outcome variable - i.e., there should be no systematic bias in which observations are excluded from the sample, otherwise your dataset is no longer representative, and then we have an entirely new problem besides reduced power!)

The term attrition is used to refer to all other reductions in sample size that happen throughout a study, which generally aren’t due to the experimenter. This can include scenarios such as human participants choosing to drop out of a study, a number of plants in the greenhouse dying, or cell cultures failing. Attrition is a particular problem in clinical studies as they recruit from small populations of sometimes vulnerable or unwell participants; further, many clinical studies also have matched samples, which means that one participant dropping out means their match in other experimental group(s) must also be excluded, compounding the difficulty. As a general role, an attrition rate of >20% is considered problematic in a clinical setting.

The best way to deal with attrition or exclusion in a sample is to prepare for it in advance, by building in a “buffer” to your sample size. It’s likely that you’ll already know in advance that the type of experiment you’re designing is likely to have a high attrition or exclusion rate; hopefully, you will also have some indication of roughly what that rate might be, based on similar studies conducted by you or colleagues. You should increase your desired final sample size by this amount, so that if the sample is reduced during data collection or analysis, there will still be sufficient power.

You’re planning to separately analyse subsets of the dataset, or make multiple comparisons

It’s absolutely acceptable to plan to do either of these two things in your analysis. The trick is to plan for them.

If you’re intending to analyse a smaller subset of the data, this means that there needs to be sufficient power within that subset in order to detect the effect(s) of interest. It’s not enough for your overall sample size to be sufficient.

Also - analysing multiple subsets of the data will almost always constitute making multiple comparisons; it may be one of the most common ways, in fact, that multiple comparisons are introduced to an analysis pipeline. When making multiple comparisons, it’s typically recommended that you should adjust your significance threshold (or your p-values, which is mathematically equivalent) to reduce the chance of making a type I error. Increasing your significance threshold necessarily will reduce power, however, because of MathsTM. So, when making lots of multiple comparisons, you need a larger sample to boost power.

You know (or suspect) that there will be lots of parameters in your model

This includes, broadly, two sets of scenarios: 1) you have lots of predictors of interest, and/or 2) you have lots of uncontrolled variables that can affect your outcome unpredictably, that you plan to include as covariates of no interest. The overall impact on power is the same in both cases. Regardless of whether a variable is a predictor of interest, its inclusion in the model will still decrease the degrees of freedom, as more parameters need to be estimated, and increase the model’s complexity.

With any luck, you’ll be able to perform model comparison and simplify your model somewhat after building it (as a reminder: we are always looking for the simplest model that does a good job of explaining the variance in the dataset, in the goodness-of-fit vs complexity trade-off). But it’s best to make sure that you account for all possible parameters when performing a priori power analyses.

7.4 Summary

Key Points
  • Error, significance, effect size and power are all related to one another, such that you can calculate desired sample size a priori
  • Multiple factors can decrease power, including making corrections for multiple comparisons and including a large number of parameters in your model
  • It’s also important to be aware of decreases in sample size from attrition and exclusion, which will reduce power