ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm))
Plotting Data
Visualising data is useful not only to explore the data and identify the relationship between different variables, get results and also to communicate the result of the analysis. The ggplot2 package allows us you to generate high quality plots in just a few lines of code. Any ggplot plot will have at least 3 components: the data, a coordinate system and a geometry (the visual representation of the data) and will be built in layers.
Let’s start by making plots!
First layer: the area of the graphic
The main function of ggplot2 is precisely ggplot()
, which allows us to start the graph and also to define the global characteristics. The first argument of this function will be the data we want to visualize, always in a data.frame. In this case we use penguins
.
The second argument is called mapping
because it’s where we define how columns of the data “map” to visual properties of the geometries. This mapping is defined by the aes()
function. In this case we indicate that in the x axis we want to plot the variable bill_length_mm
and in the y-axis the variable bill_depth_mm
.
But this alone is not enough, it only generates the first layer: the area of the graph.
Second layer: geometries
We need to add a new layer to our chart, the geometric elements or “geoms” that will represent the data. To do this we add a geom function, for example if we want to represent the data with points we will use geom_point()
.
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point()
We have our first plot!
You may have noticed that the dots are clustered in groups. Perhaps some other variable explains this behavior.
To include information from other variables in our plot we can take advantage of the aesthetic characteristics of the geometries. In this case, we can “paint” the points according to the penguin species.
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species))
Again, we use the aes()
function to map a variable in our data to an element of the plot. And aha! Each species of penguins has different characteristics!
Adding geometries
Very often it is not enough to look at the raw data to identify the relationship between variables; it is necessary to use some statistical transformation to highlight those relationships, either by fitting a model or calculating some statistics.
For this, ggplot2 has geoms that calculate some common statistical transformations. Let’s try with geom_smoth()
to fit a linear model to each species.
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species)) +
geom_smooth(aes(color = species), method = "lm")
By default geom_smooth()
fits the data using the loess method (local linear regression) when there are less than 1000 data. But it is very common that you want to fit a global linear regression. In that case, you have to set method = "lm"
.
Let’s talk about the look of the plot
For now we used the default ggplot look. We could change the look of your plot to match the style of the institution where we work, of the journal where we are going to publish it or simply to make it more eye-catching.
Let’s start with colour. To change the aesthetic appearance of a plot element, we add a new layer with the scale_*
function. In this case we’ll use scale_color_manual()
to choose the colours of the points manually. We could also use previously defined colour palettes like Viridis or Color Brewer.
We’ll need 3 colors for the 3 species, let’s use "darkorange"
, "purple"
and "cyan4"
.
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species)) +
geom_smooth(aes(color = species), method = "lm") +
scale_color_manual(values = c("darkorange","purple","cyan4"))
We are getting there! Now, let’s add some text elements with a new ggplot layer: labs()
.
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species)) +
geom_smooth(aes(color = species), method = "lm") +
scale_color_manual(values = c("darkorange","purple","cyan4")) +
labs(title = "Penguin bill dimensions",
subtitle = "Bill length and depth for Adelie, Chinstrap and Gentoo, Penguins at Palmer Station LTER",
x = "Bill length (mm)",
y = "Bill depth (mm)",
color = "Penguin species",
shape = "Penguin species")
Now the axes labels are more legible and we have a title and subtitle that explains what the plot is about.
We could keep changing this endlessly but we’ll finish with the general look of your plot.
The overall look of a plot is defined by its theme. ggplot2 has many themes available and for all tastes. But there are also other packages that extend the possibilities, for example ggthemes. By default ggplot2 uses theme_grey()
, let’s try theme_minimal()
:
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species)) +
geom_smooth(aes(color = species), method = "lm") +
scale_color_manual(values = c("darkorange","purple","cyan4")) +
labs(title = "Penguin bill dimensions",
subtitle = "Bill length and depth for Adelie, Chinstrap and Gentoo, Penguins at Palmer Station LTER",
x = "Bill length (mm)",
y = "Bill depth (mm)",
color = "Penguin species",
shape = "Penguin species") +
theme_minimal()
Now it’s your turn. Choose a theme you like and try it out. Also, if you can think of a better title, modify it!