23  Pilot studies

Published

May 12, 2023

If you’re planning an experiment, but still have some important open questions about the protocol, desired sample size or planned analysis, then you may wish to run a pilot study in advance of conducting the experiment. Its exact nature can differ depending on what you’re trying to get out of the it, but in most cases the pilot study will be a small-scale, preliminary study that is conducted ahead of the main experiment, in order to inform the final experimental design in some way. Essentially, it serves as a “trial run” of the final experiment.

23.1 Purpose(s) of a pilot study

There are different reasons to run a pilot study, and they might depend on what field of research you’re in. These reasons can include (but are not necessarily limited to!):

Testing and refining your protocol or methodology

This may include things like calibrating equipment or testing out a new behavioural task or set of stimuli. It’s also likely to involve spotting errors or problems that you would otherwise have needed to deal with during the main experiment - perhaps your instructions to participants or fellow researchers are unclear and need refining, or perhaps the lab stocks of the media you need are too low.

Note that this does not mean that you should go into your pilot study without a protocol or plan. You should have already worked out as many of the details as possible, such as how you plan to operationalise your variables, whether there are any known factors to control, and an idea of the type of analysis you might run. The more detailed your plan is before you pilot, the easier it will be to make refinements!

Training researchers

This ties in closely with the point above. The experiment may be the first time that you or other researchers have used certain protocols or equipment before, and research involves lots of skills that take practice - whether that involves interacting with patients, learning how to safely use potentially dangerous equipment, or complicated/precise bench work.

Demonstrating feasibility & exploring possible results

Sometimes, a pilot can serve as a low-risk way to check whether something is likely to work at all. For instance: is it possible to get high enough resolution images of this phenomenon? Is the rare patient population willing to take part in this sort of study? Will this plant species grow at all in our greenhouses? Are we likely to have a manageable exclusion or attrition rate?

If you’re lucky, your pilot study may also give you some idea as to the likely results that a larger study will give you; more on this in the point below. But it’s worth emphasising at this stage - this does not mean you should use a pilot study to perform hypothesis testing of your overall research question! (For instance, if you get a significant result in your pilot sample, this doesn’t mean that you don’t still need to perform the main experiment.)

Calculating an effect size for power analysis

Given a pilot dataset, it’s possible to calculate the effect size (using either the difference between group means for a t-test, or from the R2 value for linear models) for use in a power analysis - even if/when the pilot sample doesn’t yield any significant results.

It’s worth noting, however, that doing this can be a bit contentious. Some people will tell you that estimating an effect size from a pilot study is a bad idea, because smaller pilot samples are very noisy, and our confidence interval for calculated effect size from small samples is therefore very wide. Such people will tell you that the better approach is to determmine the smallest scientifically meaningful effect size that you’re interested in detecting, and use that for your power analyses instead. This raises the question, of course, of what “scientifically meaningful” means; this is often easier to determine in clinical and drug studies, than it is in more basic scientific research. But you can use standard “small”, “medium” and “large” effect sizes to guide you on this, or alternatively, you can look at existing full-scale studies (e.g., existing publications, unpublished data from your group or collaborators) to give you an idea of what sort of effect sizes might be expected for the type of research you plan to conduct. (These course materials cover effect size in a little more detail, if you’re curious.)

As a compromise between these approaches, you might use the estimated effect size from a pilot study to supplement an estimate from a previous full-scale study, to determine whether your planned experiment is likely to yield similar effect sizes to comparable research.

23.2 Pilot sample

Deciding how large your pilot sample should be is often a difficult exercise - you’re trying to make a difficult trade-off between gathering information, versus not launching into a full experiment.

If you plug the question “how big should my pilot sample be?” into Google, you may see a couple of common rules-of-thumb: “10-20% of the final desired sample size”, “12 observations in each group”, and so on. These aren’t bad starting points, but they do assume that you’re doing a certain type of research (often, these are written for clinical researchers). There are situations where the size of your pilot sample might be guided by other factors.

If you’re hoping to use your pilot to get the best possible effect size estimate, then you will likely want to push your pilot to be as large as possible, to get the narrowest confidence interval for the effect size that you possibly can.

If, conversely, you’re using the pilot simply to test feasibility (e.g., can we image this phenomenon? Will this reaction occur under these conditions?) then you may go for the minimum sample needed to confirm feasibility, and stop once you’ve achieved that.

If your intention is to make iterative changes and improvements on your protocol, then you should consider very carefully whether you truly have a single pilot study, or whether it would be better described as a series of pilots. In any case, you should ensure that the final version of the experiment that you pilot, has a sufficient sample size in itself for you to be happy that the changes and revisions you’ve made have had the effect you wanted.

An important note about keeping your pilot sample separate

It’s quite common to see researchers including their pilot sample in the main dataset, once they conduct the larger experiment. This is an easy temptation to fall for, but it’s often best to avoid doing so.

Firstly, your pilot study will almost certainly be imperfect in some way. If you have been refining or updating your protocol or training researchers as part of your pilot, then the data quality may be lower in the pilot dataset. Worse still, differences between the pilot and main protocol might introduce a confounding variable, if those differences have a meaningful impact on your response variable.

From a purely statistical point of view, there is also the possibility of increasing error by folding your pilot sample into your main sample, particularly if you hadn’t planned to do so in advance. Analysing your pilot sample, and then re-analysing these data as part of the experimental sample, also constitutes making multiple comparisons, which can increase your chance of a type I error (false positive).

Of course, as with a lot of what these course materials talk about, there are scenarios where these concerns might be outweighed. For instance, if you are working with a very rare population, it might not be feasible to discard any of the data you’ve collected; maintaining as large a sample as possible might be your priority. In these cases, you could choose to make adjustments in your analysis (e.g., including covariates) to account for any differences in protocol or equipment.

23.3 Summary

Key points
  • A pilot study can be considered a “trial run” for your experiment
  • The pilot can be used for testing/refining your protocol, training researchers, demonstrating feasibility and sometimes for estimating effect sizes
  • The required sample size for a pilot will depend on the purpose and field of research