7 minute readCoffee map: making an awesome map in R
Posted On 11 January 2019
Ever thought about where your coffee is produced?
I love to drink coffee. A lot. But I never actually thought of how much coffee each country produces each year, and where this is produced. First, I started by looking at my own coffee. Where was it produced? My coffee is from Brazil I found out, after reading the box… obviously. How big is Brazil as a producer compared to other countries, I thought? So, I decided to look for some data online and to visualise it, and to translate this data into art, which is why I will show you how to make an awesome coffee map!
So, the main question I want to answer in this post is: Where was most of the coffee produced in 2017? A follow question is: Is it there because more area is being cultivated or because of more intense agriculture? This second question will be answered in separate posts.
So, let’s find
How to visualise data in R: ggplot2
I like to use ggplot2, which is perfect for generating aesthetic plots and charts. Ggplot2 works with the principle that a plot can be split into data, aesthetics, and geometry, called the grammar of graphics. If you want to know more about how this works I refer you to the following post, which shows a nice summary of what ggplot2 can do.
Where can I find the coffee data?
I found data on coffee producing countries from FAOstat.I selected all countries, elements and years for green coffee. The downloaded data shows many columns, not all equally needed now, but it does not group countries per continent.
So, another dataset is needed which includes the sub-continent name in which the country is located. This data can be found onhere.
That’s it concerning
the data. Now let’s look at the packages I used.
First: Load relevant packages
I’m assuming all packages are installed on the device, if not quickly read this post first.
Library() will load the packages in R. We will be
library(readxl) # to load excel files into R
library(ggplot2) # for data visualisation
library(dplyr) # data wrangling
library(tidyr) # reshaping data
library(rworldmap) # world map background
Then: Load the data
The excel data is loaded into R with readxl. I have not changed the FAO data, but I did change the layout of the country-continent data in excel. In each row in excel, each country now has it’s corresponding sub-continent name next to it. Like this:
coffee0 <- read_excel("coffee.xlsx")
countries <- read_excel("countries.xlsx")
map.world <- map_data(map="world")
Are country names the same in all datasets?
collects data has their own way of writing it down… As is very common in data
collection. So, we need to check the data and see if all three sources use the
same names for countries. We do this with:
prod_count <- data.frame(unique(coffee0$Area)) %>% rename(Area = unique.coffee0.Area.)
anti_join(prod_count, countries, by = "Area")
anti_join(prod_count, world_countries, by = c("Area"="region"))
The first line makes a dataframe of all the unique countries (called Area), which is used to compare the names in the continent data. The third line checks if the world map countries are the same as the FAO’s version. The result is that 17 countries are not written in the same way. Let’s change that.
coffee0 <- coffee0 %>% mutate(Area = recode(Area,
'Bolivia (Plurinational State of)' = 'Bolivia'
, 'Cabo Verde' = 'Cape Verde'
,'China, mainland' = 'China'
,'China, Taiwan Province of' = 'Taiwan'
,'Congo' = 'Republic of Congo'
,"CÃ´te d'Ivoire" = 'Ivory Coast'
,'Democratic Republic of the Congo' = 'Democratic Republic of the Congo'
,'Ethiopia PDR' = 'Ethiopia'
,"Lao People's Democratic Republic" = 'Laos'
,'Myanmar' = 'Myanmar'
,'Sao Tome and Principe' = 'Sao Tome and Principe'
,'United Republic of Tanzania' = 'Tanzania'
,'United States of America' = 'USA'
,'Venezuela (Bolivarian Republic of)' = 'Venezuela'
, 'Saint Vincent and the Grenadines'='Saint Vincent'
,'Trinidad and Tobago'='Trinidad'
The data should now be the same, meaning the next
calculations will not delete the data.
Joining the three datasets together to make one complete set
datasets is done based on similar column names. In this case, the continent
data will be added to the coffee data by column Area (see the first line). I only want to see the coffee yield of
one year, and the most recent data is from 2017, so I will filter this data
from all of the coffee data. The coffee data shows three Elements, namely Yield, Production and Area cultivated. So, I also
need to filter on Yield.
line joins the coffee data of 2017 with the map data. Notice that the map data
does not use Area to name countries,
but region. This is no problem for R.
Making a basic visualisation of the coffee data: a map
we have enough to make a basic map. Remember that ggplot2 uses the following
setup: Plot = data + Aesthetics + Geometry
This is what
it looks like in practice;
ggplot(data=map.coffee, aes( x = long, y = lat, group = group )) +
geom_polygon(aes(fill = Value))
The plot is based on map.coffee data, and the aesthetics (aes in the first line) is made up of the longitude and latitude columns. The geometry is made up of geom_polygon. This will put the outline of all countries into the plot. This basic line of code will result in this world map:
Making art of this basic coffee map
image shows the yield as a continuous colour, which makes distinction between countries
difficult. The following pieces of code will make a more visual appealing
The breaks in the beginning of the script are a dataframe used to determine which colours the legend should be. These values are determined by using quantiles of the yield data in this example. In other words, if all yield values were put in a row, the 1st, 5th, 20th, 40th, 60th, 80th, 95th and 99th values are selected. The first line filters all the data to show only Yield data and the second line gets the quantiles of this data.
scale_fill_gradientn is used for the scaling of colours
labs is used for labelling
theme changes the background (mostly setting all
values to 0), and
annotate is used to place text in the image.
This is the
This map shows that there are a few countries that produce a lot of coffee, but that most only produce small quantities. The next step is to compare the production of all of these countries through time and with each other. Is production increasing in some countries because of more area? If we add weather data or look at management, are some countries doing better than others? Not all of these questions are as easy to answer, but in the next post we can at least start by looking at trends.
This map was inspired by this postfrom Sharp Sight, and shows some of the power of visualising data through ggplot2 and R. Comment below if you have any ideas on how to find trends in the post.