11 minute readData context: telling stories with numbers
Posted On 25 April 2019
Storytelling with numbers… sounds boring, right? But making an exciting and interesting graph can be the difference between getting funding or not. Anyone can make a graph, but one that is meaningful takes some training. In this post I will introduce you to some concepts that will get you started. The takehome message is: know the context of your audience and data.
Absolute figures in a connected world don’t give you the whole picture. They’re not as true as they could be. We need relative figures that are connected to other data, so that we can see a better picture and then that can lead to us changing our perspective.
To tell a story we need context… but storytelling with numbers is not our strong poin
When starting a project, the first steps usually include some form of data exploration, followed by testing out unique styles of presenting this data. Once a form of visualisation is chosen, the difficult part of translating this data to the audience begins. This is where the art of telling a story comes in, where data turns into information and it can be used to drive faster and more accurate decision making. But to do so you need to know the context of your data and your audience, otherwise, you might miss the point. So before even opening up your excel sheet, think about the following tips.
Make exciting stories
story is exciting and clear, the audience will stay focussed and you as the presenter
will know the information is received the way you envisioned. However, a boring
story (which is usually a boring or
overcomplicated graph) will mean the audience does not receive the information
like you wanted, which could have huge impacts if you need funding for example,
or if you’re trying to save the world with data.
So, even if
your data is 1000% correct and it contains all the solutions to a problem, if
you cannot tell the story of your data in an exciting way, you may as well have
not done it (or if you’re lucky, somebody might have picked up some relevant
points). Data visualisation is an art, an art of storytelling, but we are not naturally
good at storytelling with data.
So, what’s a story then?
In short, a story is a depiction of a journey. It is the telling of a set of observations, facts, or events, in such a way that the listener experiences or learns something new. In transferring data to the listener, it is important to understand that a single visualisation will hardly tell the entire story. A story has various parts, think of the opening, a challenge or an ending in any story, which means that your visualisation may only show the one of these aspects, but it is difficult to convey all parts of your story at once. That is why you may need a series of figures to tell your story, but there are some more tips and tricks to this. Here the context of your data comes back again: where in the story are you and what part of your data do you need to strengthen your point?
Show figures that your
There are two misconceptions that can result in you presenting figures that are completely the opposite of understandable. The first is that complex visualisations are processed fast and that key trends and relationships are understood. The second is that your audience will immediately deduce the points you are trying to make. As the storyteller you need to make sure you help the audience see the patterns you see. There is a story in your data, but your tools (excel or R packages like ggplot2) don’t know this story, which means you need to bring that story to life visually and with context.
Complexity is easier than ever
today can make extremely difficult figures. A quick image search on ‘complex
figures ggplot’ will show you just how many ways there are to show
something. Adding more dimensions to a plot is quite easy in fact, and even
tempting. The result may be an impressive figure, but with a complex story, one
your audience is unlikely to understand.
‘less is more’ is a good rule of thumb in this case, as showing too much data
at once may even result in no information being received. Split a complex
figure up into multiple simple figures, but don’t add irrelevant data which may
only result in a side story and not to the main story.
But simplicity may seem
Simple figures may seem generic, meaning they will not be remembered by your audience. Making visually complex and unique figures will ensure your audience remembers you but at a potential price: the story is not received the way you intended. At one extreme, your figure could be an exceptional piece of art, which is remembered, but not understood. The other extreme would be an extremely boring graph, but clear and to the point. This figure will be forgotten by your audience… the impact you intended will not happen. So, you need a balance between the two: a clear image but memorable.
Most of us
have learned about languages and math somewhere in life. With language we put
words together with certain rules to make a story and math teaches us to make
sense of numbers. But we hardly learn to make stories with numbers.
visualize data, we take data values and convert them into the visual elements
that make up the final picture. We refer to these features as aesthetics,
which in the philosophy is a branch that
deals with the nature of art, beauty and taste and with the creation or
appreciation of beauty.
describe every aspect of a given graphical element. Every graphical element has
a position (usually x and y), a shape, a size and a colour. When
presenting lines, they have a width
and a type (dash, dot). There is also
text, which has aspects such as font family, font face, and font size. These aesthetics describe how data relates to each
different aesthetics fall into two groups: those that represent continuous data
and those that do not.
data values are values for which an infinite number of
possible values within a selected range are possible. For example, time
duration is a continuous value: between any two durations, say 2 seconds and 3
seconds, there are arbitrarily many intermediates, such as 2.2 seconds, 2.28
seconds, 2.6505 seconds, and so on. By contrast, the number of persons in a sports
team is a discrete value. A team can hold 10 persons or 11, but not 10.8. In the
examples of the aesthetics paragraph, position,
size, colour, and line width will
represent continuous data, while shape
and line type will usually only
represent discrete data.
There are quite some types of data you would want to visualise. Usually data is seen as numbers (either discrete or continuous), but there are also other types of data, such as discrete categories, dates or times, and text. Numerical data is often referred to as quantitative, while categorical data is called qualitative. Variables holding qualitative data are factors, and the various categories are called levels. The levels of a factor are most commonly without order (“house”, “garden”, “road”), but factors can also be ordered, when there is an intrinsic order among the levels of the factor (“small”, “medium”, “big”).
Summary of the variables
Here is an overview of the six types of variables that are available:
Quantitative (numbers) – Continous
Quantitative (numbers) – Discrete
1, 9, 6248
Qualitative (categories) – Unordered
House, garden, road
Qualitative (categories) – Ordered
Samll, medium, big
Date or time
Continuous or discrete
15 April 2019, 11:15
None or discrete
Words and more words
Working at the same time
different types of data can all be found in the same dataset. As an example, I
will take some KNMI weather data, which shows the daily rainfall amount and
evaporation for three weather stations. This table contains 5 variables: STN
(station), YYYYMMDD (date), RH (rainfall), EV24 (evaporation) and STN NAME
(station name). STN and STN NAME are both an unordered factor, while RH and EV24
are continuous numerical values. Date is an ordered factor.
can be visualised in many ways, but these will be the subject of later posts.
For now, it is important to know that there are distinct types of data,
different aesthetics to visualise them, and that there is a story to tell with
visualisation starts off with exploration: looking at your data from different
angles, trying unusual ways of visualisation, and just trying to understand the
key features of your dataset. In this phase you want to try out several types
of visualisations (bar, line, scatterplot), on different subsets of data. You
will explore more of the data if you can do this in a fast and iterative way,
which increases the likelihood that you notice the key features of your data.
include in the figures in this phase is less important, things such as axis
labels or legends, so long as you, the explorer, can evaluate the patterns in
the data. It is critical that you can switch quickly between how data is shown
(boxplot, scatterplot, heatmap). In my experience ggplot can handle this aspect
with only a few changes in the code.
To becoming an artist
determining how to show what data, and especially understanding your data, it
is time to make the perfect figures for your audience. Where speed and
efficiency were essential in the first phase, preparing a high-quality figure
is more important in this phase. You can make the image and then perfect it
with some other software, but I am an enthusiastic fan of automation and
pipelines. What this means is that the same image can be reproduced the same
way if I send the script and raw data to you.
often made figures for reports, and especially repeated the steps, as new
experiments are included, or the dataset expands, or because of some simulation
results with slightly altered variables. If I would adjust the images in for
example Photoshop, I would have to do this manually in every step. Also, we
often think images are done at some point, whilst a few days or weeks later we
don’t like the image anymore. With the pipeline this is easy to adjust, whilst
doing it manually is demanding work as all figures would need to be redrawn.
Start to explore, by knowing the context
As I said in the beginning, before diving into the data and playing around with it you need to know the context, because then you know the need to communicate. To whom are you communication, what do they need to know and finally, how can you use data to strengthen your point.
Who is your audience?
your point will be easier once you’ve thought about who it is that you are
communicating to. A decisionmaker will need a different form of communication
than your parents. They would also need different kinds of information.
What is your message?
they need to know or do? Here you think about how you can make your message
relevant for your specific audience and why they should care about what you
say. If you cannot think of why your audience should know or do something,
there might not be a reason to communicate in the first place.
Also think of how you will communicate your message. A live presentation will give you more control over what your audience sees than an email, which would need a very detailed description to guide your audience to your final message. This will help you determine how much detail you need in your visualisations. Will you be present to answer questions, or should it all be clear without you being there to guide the audience?
And what information will
help make my point?
your audience; you know what you want to say. Now you look for the data to back
your story. The data will be supporting evidence of the story you are telling.
What’s the next step in storytelling?
Understanding the context will help you on the long run, as you now understand what you want and need to do. You can create content without constantly changing it to meet everybody’s needs, which will save time. Now that you understand that you need to know the context or your story, it is time to continue with the visualisations: what types are best for which circumstances? What other best practices can be applied?
Some recommended readings if you’re interested in these topics
There is much more to tell ofcourse, here you can find some stuff to read if you’re interested: