How to bring down a government using numbers
Reporters tend to prefer words. But as award-winning data journalist Jack Kerr explains, sometimes a simple spreadsheet and a graph are all you need to make headlines.
- 21 Nov 2022
- Written by
- Jack Kerr
- Reading time
- 9 min
Before heads of government were humiliated into stepping down, before Hollywood superstars were exposed and behemothic corporations shamed, before a billion was recouped in fines and unpaid taxes, and before hundreds of journalists set aside industry rivalries and a year of their professional lives to create an unprecedented global collaboration, the first domino had to fall.
That domino was nothing more than a handful of characters in an anonymous email. It landed in the inbox of Bastian Obermayer, a journalist with the Süddeutsche Zeitung in Munich. It didn't say much: "Hello, this is John Doe. Interested in data?” was all it read. But half a decade later, the effects of those eight words are still being felt.
We are talking, of course, about the Panama Papers: the landmark investigation that shone a light on the offshore investment system. In terms of raw information, it was the biggest leak in the history of journalism, eclipsing what Julian Assange and everyone else before him had previously revealed. Kilobytes became gigabytes became terabytes as more and more documents started piling into the inboxes of Obermeyer and his colleagues. Headlines were written and world leaders toppled. It took more than 370 journalists working at 76 news organisations to put together. But none of it would have been possible without one magical, often misunderstood and frequently feared thing: data.
While using data to break a story is hardly new—some trace its origins back to the early 1950s—data journalism as a field is still niche. Yet countless more Panama-like revelations may await us if more journalists would let data into their lives. True, few people get into journalism because they like spreadsheets or writing code. Words, not numbers, are the reporter’s typical tool of choice. But technological advances have made it possible for people without science doctorates to play with data in a way not possible even ten years ago. And when journalists start seeing this technology as just another weapon in their arsenal, amazing things can happen.
“What we are dealing with here is a new phenomenon,” says Gerard Ryle, the Irish-Australian director of the International Consortium of Investigative Journalists and a key figure in the Panama Papers. “Whistleblowers are able to gather information on a scale never thought possible before, so journalists are having to turn to new forms of technology to basically understand what we are seeing and to start looking for patterns.”
In the age of big data, many of the world’s secrets are hiding in plain sight. You just need to know where to look.
The Panama Papers provided a flashy demonstration of what that software could do, converting thousands of raw documents into searchable databases and easy-to-understand visualisations showing things like who was connected to whom and by how many degrees of separation. But for all their technical wizardry, the Papers also relied on a journalistic crutch as old as the typewriter and green eyeshade: someone with access to hidden information who was prepared to leak it at great potential cost to their careers and personal safety. In this way, the story of the Panama Papers is also like the story of Watergate or the Pentagon Papers, two groundbreaking controversies made possible by heroic whistleblowers. Perhaps the most revolutionary thing about data journalism, though, is that you often don’t even need a Deep Throat (or a Daniel Ellsberg, or an Edward Snowden). In the age of big data, many of the world’s secrets are hiding in plain sight in data we see every day. You just need to know where to look.
Think, for example, of one of the most annoyingly ubiquitous data sources of our time: sports betting odds. Many people consider switching channels when they come on the TV. Yet BuzzFeed and the BBC saw their investigative potential. By crunching data from 26,000 matches and mining them for irregularities—in other words, looking for instances where players regularly lost matches the stats said they likely should have won—reporters were able to expose what match-fixers had wanted to keep hidden: the throwing of a game, a set, a match. What they discovered became front-page news around the globe And it was all there in front of us; we just needed the right charts to see it.
The BuzzFeed investigation showed how easy it can be to discover groundbreaking secrets in numbers. Of course like any craft, data journalism can take a little getting used to to master. Any journalist who plans on digging up and digging through masses of data would be well advised to learn a programming language like Python. It’s relatively straightforward to learn (if you got through primary school Italian you can get through Python) and incredibly versatile; with a little guidance, it will let you scrape data from websites or PDFs, mine that data for nuggets, slice and dice it in a spreadsheet, lay it out in an interactive data visualisation and feed it into a machine-learning project.
With a few blocks of code, you can, for example, take a stack of ASIC business records PDFs and convert them into an interactive network map showing you who’s connected to who, and which PO box they might share. And because you can outsource the grunt work to your computer, a second is all that’s needed to build a database of extracted information that would take days to do manually (if you didn’t die of boredom first).
The freelance journalist that can mine the internet for leads will be the one to land the next Watergate.
From there it’s just a matter of looking for data that might turn up something interesting. Sometimes the data will be easy to find; for instance, the San Francisco Chronicle was able to pull data from Airbnb’s website to draw conclusions about the platform’s negative impact on the city’s housing market. (Hardly a surprise now, but a big deal back in 2015.) Other times, the data can be a bit more difficult to gather. Bellingcat, an investigative journalism group from the Netherlands, often relies on data sources not readily available to the public, but just as regularly gathers intel from open sources and social media. Since 2014, they’ve used this data to bring clarity to dozens of significant world events, from exposing those responsible for downing Malaysia Airlines Flight 17 to analysing the weapons and armour used in the Syrian Civil War to, more recently, locating the Russian missile programming team responsible for much of the bombing of Ukraine.
“It’s kind of like taking a massive bag of skittles, throwing them all over a table and then sorting into colours and looking at which one you have the most of,” is how Benjamin Strick, the investigations director at the Centre for Information Resilience and a contributor to Bellingcat, describes the process of making sense of all the data the group collects. The trick for any aspiring data journalist is to make sure you have enough Skittles to say something insightful with. Which is where your old-fashioned shoe-leather reporting comes into play.
For all the technological advances we’ve enjoyed over the past few decades, journalism will always be shaped by its access to information. The internet gives would-be reporters access to more information than most people could have conceived of a generation ago. Yet only a fraction of a per cent of that information may prove relevant.
In this way, everything and nothing has changed. How many of the names in Bob Woodward’s Rolodex landed him a solid scoop? The freelance journalist that can mine the internet for leads and scoops will be the one to land the next Watergate while their competitors chase the crumbs served up by government minders and corporate PR agents. Data is just another feather in the reporter’s cap, a way to shine a spotlight on the truth (and sure up your employment prospects). Because as the saying amongst analysts goes, if you don’t have the data, you’re just another hack with an opinion. Or even worse, a reporter with a press release.