Helping investigative reporters to work with data

OpenUp, in partnership with Africa Data Hub, recently hosted a two-day workshop in Johannesburg where Southern Africa's top investigative journalists were challenged to transform a laptop-breaking spreadsheets into a simple 11x2 machine readable tables that can easily used to generate powerful and meaningful data stories.

How do you go from a laptop-breaking spreadsheet with around one and a half million data points to a simple 11x2 table that can be neatly visualised using an online tool like Flourish? It’s not the kind of challenge many news reporters are taught to deal with either as part of formal or on-the-job training, but one which a group of Southern Africa’s best investigative journalists wrestled with during a two-day workshop hosted in Johannesburg this April by OpenUp.

And this was only the start of something bigger: with the backing of Africa Data Hub, the next few months will see OpenUp working with these attendees to investigate how civic technology can be used to support journalists over a longer period of time — and bridge the gap between training sessions and an experienced practitioner.

Why journalists need data training

There is an increasing recognition of the role data plays in modern investigative journalism and storytelling. Whether it’s fact-checking a press release, making sense of a new report, digging into government data or analysing leaked datasets, mastering the tools for data cleaning, analysis and communication is a vital part of the job. Taking a million data points and turning it into a message for Twitter, or allowing readers to navigate complex data to understand how an issue affects them, makes journalism more accessible, engaging and relevant.

OpenUp has offered training courses for journalists and other communications professionals who need to make sense of large and small data for many years. For this particular training, it brought together members of the IJHub network in newsrooms from South Africa, Malawi, Zambia, Namibia and eSwatini. 

It’s about the story, not the data

The two days were highly interactive, and participants wanted to learn how to integrate data into their existing skillsets. When teaching data-driven storytelling it’s often the case that the “storytelling” part of the equation is lost in the technical challenge of data wrangling. From the perspective of the trainers, then, it was a genuine treat to work with a group of highly skilled and accomplished professionals whose understanding of narrative structure has been keenly honed.

The data storytelling pipeline is the backbone of OpenUp’s training process.


The core of OpenUp’s data-driven storytelling training is the data processing pipeline, shown above. Our approach is to demonstrate the use of freely available tools for all six stages of the pipeline - finding data, getting data in machine readable formats, verifying data, cleaning data, analysing data and packaging and presenting stories. We design exercises to demonstrate simple techniques for speeding up data acquisition and processing, and spend time with individuals as they work through sample data of their own with a view to publishing a final story. We demonstrate examples of the very best data-driven storytelling, and share best practices for integrating data tools in a journalists’ every day work.

It’s a lot to cram into 48 hours.

An example dataset showing how to clean and prepare a spreadsheet for analysis.

Developing the data journalism helpdesk

One of the challenges of training a complex subject such as data-driven storytelling is that there is an inevitable gap between what you see in the classroom and what you’ll work with in real-life. While all of our training materials are based on real-world examples (usually stories we’ve worked on), a common experience is that people struggle when they are back at their desk grappling with the fact that no two datasets look the same, and putting cleaning techniques they’ve learnt into practice is hard. It can take years to become truly proficient with the toolkit of a data wrangler.

Over the next few months, then, we’ll be working with IJHub’s membership to see what kind of long term, scaleable support is possible for these situations via a project like Africa Data Hub. By designing for the needs of these highly skilled journalists, we hope to be able to open up the helpdesk to other organisations in the near future. Check back to see details on that soon.

What participants said about the training and Helpdesk 

“I think it's a great idea! I've done several data journalism training workshops before this and I always walk away having learnt a lot but you only really figure out where your knowledge gaps are when you're working with data for a story. So I think the help desk is a really innovative way to tackle that problem and have help on hand if needed.”

- Participant from the Namibian Investigative Unit

Share this post:
Email iconTwitter icon

OpenUp, in partnership with Africa Data Hub, recently hosted a two-day workshop in Johannesburg where Southern Africa's top investigative journalists were challenged to transform a laptop-breaking spreadsheets into a simple 11x2 machine readable tables that can easily used to generate powerful and meaningful data stories.

How do you go from a laptop-breaking spreadsheet with around one and a half million data points to a simple 11x2 table that can be neatly visualised using an online tool like Flourish? It’s not the kind of challenge many news reporters are taught to deal with either as part of formal or on-the-job training, but one which a group of Southern Africa’s best investigative journalists wrestled with during a two-day workshop hosted in Johannesburg this April by OpenUp.

And this was only the start of something bigger: with the backing of Africa Data Hub, the next few months will see OpenUp working with these attendees to investigate how civic technology can be used to support journalists over a longer period of time — and bridge the gap between training sessions and an experienced practitioner.

Why journalists need data training

There is an increasing recognition of the role data plays in modern investigative journalism and storytelling. Whether it’s fact-checking a press release, making sense of a new report, digging into government data or analysing leaked datasets, mastering the tools for data cleaning, analysis and communication is a vital part of the job. Taking a million data points and turning it into a message for Twitter, or allowing readers to navigate complex data to understand how an issue affects them, makes journalism more accessible, engaging and relevant.

OpenUp has offered training courses for journalists and other communications professionals who need to make sense of large and small data for many years. For this particular training, it brought together members of the IJHub network in newsrooms from South Africa, Malawi, Zambia, Namibia and eSwatini. 

It’s about the story, not the data

The two days were highly interactive, and participants wanted to learn how to integrate data into their existing skillsets. When teaching data-driven storytelling it’s often the case that the “storytelling” part of the equation is lost in the technical challenge of data wrangling. From the perspective of the trainers, then, it was a genuine treat to work with a group of highly skilled and accomplished professionals whose understanding of narrative structure has been keenly honed.

The data storytelling pipeline is the backbone of OpenUp’s training process.


The core of OpenUp’s data-driven storytelling training is the data processing pipeline, shown above. Our approach is to demonstrate the use of freely available tools for all six stages of the pipeline - finding data, getting data in machine readable formats, verifying data, cleaning data, analysing data and packaging and presenting stories. We design exercises to demonstrate simple techniques for speeding up data acquisition and processing, and spend time with individuals as they work through sample data of their own with a view to publishing a final story. We demonstrate examples of the very best data-driven storytelling, and share best practices for integrating data tools in a journalists’ every day work.

It’s a lot to cram into 48 hours.

An example dataset showing how to clean and prepare a spreadsheet for analysis.

Developing the data journalism helpdesk

One of the challenges of training a complex subject such as data-driven storytelling is that there is an inevitable gap between what you see in the classroom and what you’ll work with in real-life. While all of our training materials are based on real-world examples (usually stories we’ve worked on), a common experience is that people struggle when they are back at their desk grappling with the fact that no two datasets look the same, and putting cleaning techniques they’ve learnt into practice is hard. It can take years to become truly proficient with the toolkit of a data wrangler.

Over the next few months, then, we’ll be working with IJHub’s membership to see what kind of long term, scaleable support is possible for these situations via a project like Africa Data Hub. By designing for the needs of these highly skilled journalists, we hope to be able to open up the helpdesk to other organisations in the near future. Check back to see details on that soon.

What participants said about the training and Helpdesk 

“I think it's a great idea! I've done several data journalism training workshops before this and I always walk away having learnt a lot but you only really figure out where your knowledge gaps are when you're working with data for a story. So I think the help desk is a really innovative way to tackle that problem and have help on hand if needed.”

- Participant from the Namibian Investigative Unit