10 Grouped operations
10.1 Context
We’ve done different types of operations, all on the entire data set. Sometimes there is structure within the data, such as different groups (e.g. genotypes, patient cohorts, geographical areas etc). We might then want information on a group-by-group basis.
10.2 Split-apply-combine
This kind of operation can be referred to as split-apply-combine, because we split the data, apply some function and then combine the outcome.
Let’s illustrate this with an example. Figure 10.1 shows a hypothetical data set, where we have temperature and rainfall measurements for different cities.

Let’s assume we were interested in the average temperature for each city. We would have to do the following:
- Split the data by
city
- Calculate the average
temperature
- Combine the outcome together in a new table
This is visualised in Figure 10.2.

10.3 Summary operations
10.3.1 Summarising data
LO: summarising data
10.3.2 Grouped summaries
LO: grouped summaries
10.4 Counting data
10.4.1 Counting
LO: counting
10.4.2 Counting by group
LO: counting data by group
10.4.3 Counting missing values
LO: counting with missing values
10.5 Grouped operations
10.5.1 Grouped filters
LO: grouped filters
10.5.2 Grouped changes
LO: grouped mutate
10.5.3 To ungroup or not ungroup
LO: the importance of ungrouping