11 minute readData context: telling stories with numbers

Storytelling with numbers… sounds boring, right? But making an exciting and interesting graph can be the difference between getting funding or not. Anyone can make a graph, but one that is meaningful takes some training. In this post I will introduce you to some concepts that will get you started. The takehome message is: know the context of your audience and data.

As David McCandless said in his Ted talk:

Absolute figures in a connected world don’t give you the whole picture. They’re not as true as they could be. We need relative figures that are connected to other data, so that we can see a better picture and then that can lead to us changing our perspective.

David McCandless

To tell a story we need context… but storytelling with numbers is not our strong poin

When starting a project, the first steps usually include some form of data exploration, followed by testing out unique styles of presenting this data. Once a form of visualisation is chosen, the difficult part of translating this data to the audience begins. This is where the art of telling a story comes in, where data turns into information and it can be used to drive faster and more accurate decision making. But to do so you need to know the context of your data and your audience, otherwise, you might miss the point. So before even opening up your excel sheet, think about the following tips.

Make exciting stories

If your story is exciting and clear, the audience will stay focussed and you as the presenter will know the information is received the way you envisioned. However, a boring story (which is usually a boring or overcomplicated graph) will mean the audience does not receive the information like you wanted, which could have huge impacts if you need funding for example, or if you’re trying to save the world with data.

what's your story?

So, even if your data is 1000% correct and it contains all the solutions to a problem, if you cannot tell the story of your data in an exciting way, you may as well have not done it (or if you’re lucky, somebody might have picked up some relevant points). Data visualisation is an art, an art of storytelling, but we are not naturally good at storytelling with data.

So, what’s a story then?

In short, a story is a depiction of a journey. It is the telling of a set of observations, facts, or events, in such a way that the listener experiences or learns something new. In transferring data to the listener, it is important to understand that a single visualisation will hardly tell the entire story. A story has various parts, think of the opening, a challenge or an ending in any story, which means that your visualisation may only show the one of these aspects, but it is difficult to convey all parts of your story at once. That is why you may need a series of figures to tell your story, but there are some more tips and tricks to this. Here the context of your data comes back again: where in the story are you and what part of your data do you need to strengthen your point?

Show figures that your audience understand

There are two misconceptions that can result in you presenting figures that are completely the opposite of understandable. The first is that complex visualisations are processed fast and that key trends and relationships are understood. The second is that your audience will immediately deduce the points you are trying to make. As the storyteller you need to make sure you help the audience see the patterns you see. There is a story in your data, but your tools (excel or R packages like ggplot2) don’t know this story, which means you need to bring that story to life visually and with context.           

A lot is happening in this complex graph, but what is it telling us?

Complexity is easier than ever

Software today can make extremely difficult figures. A quick image search on ‘complex figures ggplot’ will show you just how many ways there are to show something. Adding more dimensions to a plot is quite easy in fact, and even tempting. The result may be an impressive figure, but with a complex story, one your audience is unlikely to understand.

The rule ‘less is more’ is a good rule of thumb in this case, as showing too much data at once may even result in no information being received. Split a complex figure up into multiple simple figures, but don’t add irrelevant data which may only result in a side story and not to the main story.

And what about this one?

But simplicity may seem generic, right?

Simple figures may seem generic, meaning they will not be remembered by your audience. Making visually complex and unique figures will ensure your audience remembers you but at a potential price: the story is not received the way you intended. At one extreme, your figure could be an exceptional piece of art, which is remembered, but not understood. The other extreme would be an extremely boring graph, but clear and to the point. This figure will be forgotten by your audience… the impact you intended will not happen. So, you need a balance between the two: a clear image but memorable.

Two languages

Most of us have learned about languages and math somewhere in life. With language we put words together with certain rules to make a story and math teaches us to make sense of numbers. But we hardly learn to make stories with numbers.

Aesthetics

Whenever we visualize data, we take data values and convert them into the visual elements that make up the final picture. We refer to these features as aesthetics, which in the philosophy is a branch that deals with the nature of art, beauty and taste and with the creation or appreciation of beauty.

Aesthetics describe every aspect of a given graphical element. Every graphical element has a position (usually x and y), a shape, a size and a colour. When presenting lines, they have a width and a type (dash, dot). There is also text, which has aspects such as font family, font face, and font size. These aesthetics describe how data relates to each other.

All these different aesthetics fall into two groups: those that represent continuous data and those that do not.

Variables

Continuous data values are values for which an infinite number of possible values within a selected range are possible. For example, time duration is a continuous value: between any two durations, say 2 seconds and 3 seconds, there are arbitrarily many intermediates, such as 2.2 seconds, 2.28 seconds, 2.6505 seconds, and so on. By contrast, the number of persons in a sports team is a discrete value. A team can hold 10 persons or 11, but not 10.8. In the examples of the aesthetics paragraph, position, size, colour, and line width will represent continuous data, while shape and line type will usually only represent discrete data.

There are quite some types of data you would want to visualise. Usually data is seen as numbers (either discrete or continuous), but there are also other types of data, such as discrete categories, dates or times, and text. Numerical data is often referred to as quantitative, while categorical data is called qualitative. Variables holding qualitative data are factors, and the various categories are called levels. The levels of a factor are most commonly without order (“house”, “garden”, “road”), but factors can also be ordered, when there is an intrinsic order among the levels of the factor (“small”, “medium”, “big”).

Different variables (Yes/no, Female/male, total_bill, count) on one graph

Summary of the variables

Here is an overview of the six types of variables that are available:

VariableScaleExample
Quantitative (numbers)
– Continous
Continuous3.8, 9.8
Quantitative (numbers)
– Discrete
Discrete1, 9, 6248
Qualitative (categories)
– Unordered
Discrete House, garden, road
Qualitative (categories)
– Ordered
Discrete Samll, medium, big
Date or timeContinuous or discrete15 April 2019, 11:15
TextNone or discreteWords and more words

Working at the same time

The different types of data can all be found in the same dataset. As an example, I will take some KNMI weather data, which shows the daily rainfall amount and evaporation for three weather stations. This table contains 5 variables: STN (station), YYYYMMDD (date), RH (rainfall), EV24 (evaporation) and STN NAME (station name). STN and STN NAME are both an unordered factor, while RH and EV24 are continuous numerical values. Date is an ordered factor.

STN YYYYMMDD RH EV24 STN NAME
391 20120101 105 1 ARCEN
391 20120102 48 3 ARCEN
391 20120103 120 1 ARCEN
370 20120101 154 1 EINDHOVEN
370 20120102 47 3 EINDHOVEN
370 20120103 116 1 EINDHOVEN
310 20120101 107 1 VLISSINGEN
310 20120102 8 3 VLISSINGEN
310 20120103 105 1 VLISSINGEN

This table can be visualised in many ways, but these will be the subject of later posts. For now, it is important to know that there are distinct types of data, different aesthetics to visualise them, and that there is a story to tell with numbers 😊

The aesthetics part can be found here, with the same dataset.

Two phases in data visualisation

From being an explorer

Data visualisation starts off with exploration: looking at your data from different angles, trying unusual ways of visualisation, and just trying to understand the key features of your dataset. In this phase you want to try out several types of visualisations (bar, line, scatterplot), on different subsets of data. You will explore more of the data if you can do this in a fast and iterative way, which increases the likelihood that you notice the key features of your data.

What you include in the figures in this phase is less important, things such as axis labels or legends, so long as you, the explorer, can evaluate the patterns in the data. It is critical that you can switch quickly between how data is shown (boxplot, scatterplot, heatmap). In my experience ggplot can handle this aspect with only a few changes in the code.

To becoming an artist

After determining how to show what data, and especially understanding your data, it is time to make the perfect figures for your audience. Where speed and efficiency were essential in the first phase, preparing a high-quality figure is more important in this phase. You can make the image and then perfect it with some other software, but I am an enthusiastic fan of automation and pipelines. What this means is that the same image can be reproduced the same way if I send the script and raw data to you.

I have often made figures for reports, and especially repeated the steps, as new experiments are included, or the dataset expands, or because of some simulation results with slightly altered variables. If I would adjust the images in for example Photoshop, I would have to do this manually in every step. Also, we often think images are done at some point, whilst a few days or weeks later we don’t like the image anymore. With the pipeline this is easy to adjust, whilst doing it manually is demanding work as all figures would need to be redrawn.

Start to explore, by knowing the context

As I said in the beginning, before diving into the data and playing around with it you need to know the context, because then you know the need to communicate. To whom are you communication, what do they need to know and finally, how can you use data to strengthen your point.

Who is your audience?

Communicating your point will be easier once you’ve thought about who it is that you are communicating to. A decisionmaker will need a different form of communication than your parents. They would also need different kinds of information.

What is your message?

What do they need to know or do? Here you think about how you can make your message relevant for your specific audience and why they should care about what you say. If you cannot think of why your audience should know or do something, there might not be a reason to communicate in the first place.

Also think of how you will communicate your message. A live presentation will give you more control over what your audience sees than an email, which would need a very detailed description to guide your audience to your final message. This will help you determine how much detail you need in your visualisations. Will you be present to answer questions, or should it all be clear without you being there to guide the audience?

And what information will help make my point?

You know your audience; you know what you want to say. Now you look for the data to back your story. The data will be supporting evidence of the story you are telling.     

What’s the next step in storytelling?

Understanding the context will help you on the long run, as you now understand what you want and need to do. You can create content without constantly changing it to meet everybody’s needs, which will save time. Now that you understand that you need to know the context or your story, it is time to continue with the visualisations: what types are best for which circumstances? What other best practices can be applied?

Some recommended readings if you’re interested in these topics

There is much more to tell ofcourse, here you can find some stuff to read if you’re interested:

Add a Comment

Your email address will not be published. Required fields are marked *