7 minute readCoffee map: making an awesome map in R

Ever thought about where your coffee is produced?

I love to drink coffee. A lot. But I never actually thought of how much coffee each country produces each year, and where this is produced. First, I started by looking at my own coffee. Where was it produced? My coffee is from Brazil I found out, after reading the box… obviously. How big is Brazil as a producer compared to other countries, I thought? So, I decided to look for some data online and to visualise it, and to translate this data into art, which is why I will show you how to make an awesome coffee map!

So, the main question I want to answer in this post is: Where was most of the coffee produced in 2017? A follow question is: Is it there because more area is being cultivated or because of more intense agriculture? This second question will be answered in separate posts.

So, let’s find the answer!

How to visualise data in R: ggplot2

I like to use ggplot2, which is perfect for generating aesthetic plots and charts. Ggplot2 works with the principle that a plot can be split into data, aesthetics, and geometry, called the grammar of graphics. If you want to know more about how this works I refer you to the following post, which shows a nice summary of what ggplot2 can do.

Where can I find the coffee data?

I found data on coffee producing countries from FAOstat. I selected all countries, elements and years for green coffee. The downloaded data shows many columns, not all equally needed now, but it does not group countries per continent.

So, another dataset is needed which includes the sub-continent name in which the country is located. This data can be found on here.

That’s it concerning the data. Now let’s look at the packages I used.

First: Load relevant packages

I’m assuming all packages are installed on the device, if not quickly read this post first.

Library() will load the packages in R. We will be needing:

library(readxl)     # to load excel files into R
library(ggplot2)    # for data visualisation
library(dplyr)      # data wrangling
library(tidyr)      # reshaping data
library(rworldmap)  # world map background

Then: Load the data

The excel data is loaded into R with readxl. I have not changed the FAO data, but I did change the layout of the country-continent data in excel. In each row in excel, each country now has it’s corresponding sub-continent name next to it. Like this:

Continent Area
South America Chile
coffee0   <- read_excel("coffee.xlsx")
countries <- read_excel("countries.xlsx")
map.world <- map_data(map="world")

Are country names the same in all datasets?

Everybody who collects data has their own way of writing it down… As is very common in data collection. So, we need to check the data and see if all three sources use the same names for countries. We do this with:

prod_count <- data.frame(unique(coffee0$Area)) %>% rename(Area = unique.coffee0.Area.)

anti_join(prod_count, countries, by = "Area") 

anti_join(prod_count, world_countries, by = c("Area"="region"))

The first line makes a dataframe of all the unique countries (called Area), which is used to compare the names in the continent data. The third line checks if the world map countries are the same as the FAO’s version. The result is that 17 countries are not written in the same way. Let’s change that.

coffee0 <- coffee0 %>%  mutate(Area = recode(Area,
       'Bolivia (Plurinational State of)' = 'Bolivia'
       , 'Cabo Verde' = 'Cape Verde'
       ,'China, mainland' = 'China'
       ,'China, Taiwan Province of' = 'Taiwan'
       ,'Congo' = 'Republic of Congo'
       ,"Côte d'Ivoire" = 'Ivory Coast'
       ,'Democratic Republic of the Congo' = 'Democratic Republic of the Congo'
       ,'Ethiopia PDR' = 'Ethiopia'
       ,"Lao People's Democratic Republic" = 'Laos'
       ,'Myanmar' = 'Myanmar'
       ,'Sao Tome and Principe' = 'Sao Tome and Principe'
       ,'United Republic of Tanzania' = 'Tanzania'
       ,'United States of America' = 'USA'
       ,'Venezuela (Bolivarian Republic of)' = 'Venezuela'
       ,'Viet Nam'='Vietnam'
       , 'Saint Vincent and the Grenadines'='Saint Vincent'
       ,'Trinidad and Tobago'='Trinidad'
))

The data should now be the same, meaning the next calculations will not delete the data.

Joining the three datasets together to make one complete set

Joining datasets is done based on similar column names. In this case, the continent data will be added to the coffee data by column Area (see the first line). I only want to see the coffee yield of one year, and the most recent data is from 2017, so I will filter this data from all of the coffee data. The coffee data shows three Elements, namely Yield, Production and Area cultivated. So, I also need to filter on Yield.

The third line joins the coffee data of 2017 with the map data. Notice that the map data does not use Area to name countries, but region. This is no problem for R.

coffee1    <- left_join(coffee0 ,countries, by="Area")

coffee2017 <- coffee1 %>% filter(Year=="2017" & Element == "Yield") 

map.coffee <- left_join( map.world, coffee2017, by = c('region' = 'Area'))

Making a basic visualisation of the coffee data: a map

Right now, we have enough to make a basic map. Remember that ggplot2 uses the following setup: Plot = data + Aesthetics + Geometry

This is what it looks like in practice;

ggplot(data=map.coffee, aes( x = long, y = lat, group = group )) +
  geom_polygon(aes(fill = Value))

The plot is based on map.coffee data, and the aesthetics (aes in the first line) is made up of the longitude and latitude columns. The geometry is made up of geom_polygon. This will put the outline of all countries into the plot. This basic line of code will result in this world map:

basic coffee map

Making art of this basic coffee map

The previous image shows the yield as a continuous colour, which makes distinction between countries difficult. The following pieces of code will make a more visual appealing graph.

breaks <- c(  500 , 1500  ,2800 , 4400 , 6100 , 9000 ,15500, 24500 )

ggplot(map.coffee, aes( x = long, y = lat, group = group )) +
  geom_polygon(aes(fill = Value)) +
  scale_fill_gradientn(colours =  c('#461863','#404E88','#2A8A8C',' #7FD157','#F9E53F')    ,values = scales::rescale(breaks)
                       ,labels = comma
                       ,breaks = breaks
  ) +
  guides(fill = guide_legend(reverse = T)) +
  labs(fill = 'Yield hg/year'
       ,title = 'Coffee yield by Country'
       ,subtitle = 'Yield in hg in 2017'
       ,x = NULL
       ,y = NULL) +
  theme(text = element_text(family = 'Gill Sans', color = '#EEEEEE')
        ,plot.title = element_text(size = 28)
        ,plot.subtitle = element_text(size = 14)
        ,axis.ticks = element_blank()
        ,axis.text = element_blank()
        ,panel.grid = element_blank()
        ,panel.background = element_rect(fill = '#333333')
        ,plot.background = element_rect(fill = '#333333')
        ,legend.position = c(.18,.36)
        ,legend.background = element_blank()
        ,legend.key = element_blank()
  ) +
  annotate(geom = 'text'
           ,label = 'Source: FAOstat'
           ,x = 18, y = -55
           ,size = 3
           ,family = 'Gill Sans'
           ,color = '#CCCCCC'
           ,hjust = 'left'
  )

The breaks in the beginning of the script are a dataframe used to determine which colours the legend should be. These values are determined by using quantiles of the yield data in this example. In other words, if all yield values were put in a row, the 1st, 5th, 20th, 40th, 60th, 80th, 95th and 99th values are selected. The first line filters all the data to show only Yield data and the second line gets the quantiles of this data.

Yield <- coffee1 %>% filter(Element=="Yield”)
breaks0 <- quantile(Yield$Yield_hg, probs = c(0.01, 0.05, 0.2, 0.4, 0.6, 0.8 ,0.95 ,0.99))

 The rest of the code only changes the aesthetics.

  • scale_fill_gradientn is used for the scaling of colours
  • labs is used for labelling
  • theme changes the background (mostly setting all values to 0), and
  • annotate is used to place text in the image.

This is the result:

Coffee map with more aesthetics

This map shows that there are a few countries that produce a lot of coffee, but that most only produce small quantities. The next step is to compare the production of all of these countries through time and with each other. Is production increasing in some countries because of more area? If we add weather data or look at management, are some countries doing better than others? Not all of these questions are as easy to answer, but in the next post we can at least start by looking at trends.

Conclusion

This map was inspired by this postfrom Sharp Sight, and shows some of the power of visualising data through ggplot2 and R. Comment below if you have any ideas on how to find trends in the post.


Edit: this is the first of three posts. Click here for the second and here for the third post in this coffee exploring theme.

4 Comments

Add a Comment

Your email address will not be published. Required fields are marked *