Four Steps to a Data-Driven Story

So you love data-based stories and would love to create some by yourself, but don’t know how? This guide gives you an overview of what to expect when “doing” #ddj. This includes some ideas of respective tools.
 

Four Steps to Publication

paul bredshaw on guardian data blogOur guide is highly inspired by Paul Bradshaw, online journalist, trainer and contributor to The Guardian’s Data Blog. He claims four steps that are crucial for publishing a data-driven story, namely
1. finding data
2. interrogating data
3. visualizing the findings
4. actually telling the story
We enriched Paul’s list by adding tools that might help you to get through the single steps. Doing #ddj is no sorcery. You can do it!

 

Step 1: Finding Data

There is that one buzzword that is quite hard to ignore: big data. But where is all that data hidden that everyone is talking about? Indeed, we are surrounded by data. Just make yourself aware about all the statistics, sports results, prices and much more. Finding the right data is usually not just a one click thing. But – that is the good news – there is help.
There are many platforms that collect data-sets, such as open data websites and statistical offices of different countries or other entities that constantly publish data-sets. You can find a list of such platforms in one of our previous articles here. Furthermore, some (live) data can be extracted directly via an API. One of the most popular ones is the Twitter API. But also other social media networks allow access to their data, for instance the music streaming service spotify or the social news website reddit. Handling APIs is more than just downloading an existing data-set. But again, it’s not rocket science, and you can be sure that you are not the first one to try to access for instance data from twitter. Therefore, you will be amazed to see how big the community is and how much help you can find online (stackoverflow is usually a good address, as is demonstrated here and here).
Another way of collection data is scraping it. There are some sophisticated tools out there that help you scrape the data you want. You don’t need to write a python script (though you can), but a simple Google Chrome Plug-In, Tabula, blockspring or import.io might already help you.

examples of scraping and collecting data: import.io, blockspring, yds

Step 2: Interrogating the Data

After you have found your data, the real work begins. The first step should always be to get to know your data (What’s actually in there?). There are several ways of doing so. You can look at the entries themselves and play with them, like applying different filters. You can also visualize them in order to interrogate. Both approaches work best if you combine them. They will help you understand and review your data, helping you find answers to questions like
Does the data make sense?
Is there something missing?
Is the format correct?

These are questions that you will face automatically when interrogating the data.

Now let us talk about tools. Many data-driven storytellers work with Excel or other spreadsheets. In the world of #ddj open refine is a very recognized tool in order to get data straight. Furthermore, Google Explore (a function within Google Spreadsheet) and Keshif might be worth to take a look at. Real Excel-lovers might test Excel Plug-Ins, programmers might have a look at R Studio, statisticians on SPSS or other statistical software solutions.
You see, you will never left alone with the data, but a willingness to test and learn new tools is required.
 

Step 3: The Funny Part – Visualization

examples of visualization tools: carto db & JS timelineWe assume that for most of you this is the most amazing part: visualizing the results. This is the moment when you should clearly see what the message of your data-driven story is and how to represent it. A quick search on the internet reveals that there is an unlimited number of tools that will help you with this step. Before you dig into the world of tools, it might help to set some things straight. Make yourself aware about the abilities you have. For instance, are you familiar with mapping? Do you have programming skills? How fast do you need to complete your visualization? And how do you want to publish it?

Once you know what you are looking for, you will find the appropriate tool a lot quicker. It always helps to check the tool’s official website but experience reports by others will also be very helpful. For instance, we tested the tools cartooDB(for mapping), Timeline JS(for timelines) and the (interactive) chart tool Tableau.
 

Step 4: Mashing it

The art is done and you want the world to see and receive the message. How do you do that? In case you have a media outlet on your side, you best discuss the specific requirements for publishing the data-driven story on the website with your technical staff. Often, very easy solutions such as embedding an iFrame, are possible.
Nowadays there are (many) alternative ways of publishing. You can set up your own data-blog, or use platforms such as Storify and Medium. Or you publish your visualization on social media channels, like Instagram, Twitter or Facebook. There is even services that allow you to create a virtual poster that tells a data-driven story, such as infogr.am or piktochart.

examples of publishing platforms: spotify & medium
You see, there are many options to let the world know about your findings. Please, never forget to tell the world that your visualization is out there. Your story is not just data-driven but also driven by the impact it has on social networks. Thus, keep the hashtag #ddj in mind.
 

About the author : Eva Lopez (DW)