<- read_csv("data/gapminder_clean.csv") gapminder
13 Making data stand out
- Learn how to make some data stand out from the crowd
13.1 Libraries and functions
13.1.1 Libraries
13.1.2 Functions
13.2 Purpose and aim
Often we have to deal with large amounts of data. But even more often we want to draw the reader’s attention to a particular property of the data that we’re displaying. In these cases it can be useful to visually let your data stand out.
13.3 Loading data
If you haven’t done so yet, please load the data as follows:
13.4 The popout
To be honest, I’m not entirely sure if this is a word, but it’s a technique that is often used in visualisation. Here we’re colouring our data of interest in a highly contrasting way to the rest of the data, to make them “pop out”.
Have a look at the following plot, where we are plotting child mortality against the number of children per woman:
ggplot(data = gapminder,
aes(x = children_per_woman,
y = child_mortality)) +
geom_point()
It looks like there is a strong positive correlation between the number of children per women and child mortality, since child mortality seems to increase as the number of children per woman is higher.
In our data set we have a variable called is_oecd
, which contains TRUE/FALSE
statements on whether a country is part of the Organisation for Economic Co-operation and Development. This organisation stimulates economic progress and world trade and consists of primarily richer, Western countries.
Let’s say we wanted to explore if this possible correlation has something to do with the economic status of the country. If that’s the case, it could be that child mortality rates in OECD countries is perhaps not linked to the number of children per women, whereas it is in non-OECD countries. Perhaps differences in the quality of healthcare lead to better survival rates for children in richer countries, even if a woman has many children.
We can visualise this.
ggplot(data = gapminder,
aes(x = children_per_woman,
y = child_mortality,
colour = is_oecd)) +
geom_point()
It looks like child mortality rates in non-OECD countries are much more variable than in OECD countries. There doesn’t appear to be enough spread in the OECD country data to draw conclusions about the OECD countries.
We’ve previously seen that life expectancy is poor in sub-Saharan Africa. It would therefore be interesting to investigate if that has anything to do with high levels of child mortality.
To visualise this we need to colour the data of the sub-Saharan Africa world_region
a different colour than the rest of the data.
We can use a similar technique as we did for the OECD countries. We first need to create a column that contains information about whether a country is in sub-Saharan Africa and then colour the data accordingly.
Here we also manually update the colours to increase data discriminability.
%>%
gapminder mutate(is_sub_saharan_africa = world_region == "sub_saharan_africa") %>%
ggplot(aes(x = children_per_woman,
y = child_mortality,
colour = is_sub_saharan_africa)) +
geom_point() +
scale_colour_manual(values = c("grey", "firebrick"))
13.4.1 Exercises
13.5 Summary
- We can use a popout to emphasise data