10  Grouped operations

Learning objectives

10.1 Context

We’ve done different types of operations, all on the entire data set. Sometimes there is structure within the data, such as different groups (e.g. genotypes, patient cohorts, geographical areas etc). We might then want information on a group-by-group basis.

10.2 Split-apply-combine

This kind of operation can be referred to as split-apply-combine, because we split the data, apply some function and then combine the outcome.

Let’s illustrate this with an example. Figure 10.1 shows a hypothetical data set, where we have temperature and rainfall measurements for different cities.

Figure 10.1: An example of a table with groups

Let’s assume we were interested in the average temperature for each city. We would have to do the following:

  1. Split the data by city
  2. Calculate the average temperature
  3. Combine the outcome together in a new table

This is visualised in Figure 10.2.

Figure 10.2: Split-apply-combine

10.3 Summary operations

10.3.1 Summarising data

LO: summarising data

10.3.2 Grouped summaries

LO: grouped summaries

10.4 Counting data

10.4.1 Counting

LO: counting

10.4.2 Counting by group

LO: counting data by group

10.4.3 Counting missing values

LO: counting with missing values

10.5 Grouped operations

10.5.1 Grouped filters

LO: grouped filters

10.5.2 Grouped changes

LO: grouped mutate

10.5.3 To ungroup or not ungroup

LO: the importance of ungrouping

10.6 Summary

Key points