Thoughts from the Global Investigative Journalism Conference

Written by
  • lucy
Outdated Content Warning: This content refers to an older version of OpenSpending. See here for information about the next version of OpenSpending and ways to contribute.

This post is by Lucy Chambers, community coordinator at the Open Knowledge Foundation, and Friedrich Lindenberg, Developer on OpenSpending. They recently attended the Global Investigative Journalism Conference 2011 in Kyiv, Ukraine, and in this post, bring home their thoughts on journalist-programmer collaboration…

The conference

The Global Investigative Journalism Conference must be one of the most intense yet rewarding experiences either of us have attended since joining the OKF. With topics ranging from human trafficking to offshore companies, the meeting highlighted the importance of long-term, investigative reporting in great clarity.

With around 500 participants from all over the globe with plenty of experience in evidence gathering, we used this opportunity to ask many of them how platforms like OpenSpending can contribute, not only to the way in which data is presented, but also to how it is gathered and analyzed in the course of an investigation.

Spending Stories - the brainstorm

As many of you will be aware, earlier this year we won a Knight News Challenge award to help journalists contextualise and build narratives around spending data. Research for the project, Spending Stories, was one of the main reasons for our trip to Ukraine…

During the data clinic session as well as over drinks in the bar of “Hotel President” we asked the investigators what they would like to see in a spend analysis platform targeted at data journalists. Cutting to the chase, they immediately raised the key questions:

How will it support my work?

It was clear that the platform should support the existing journalistic workflow through publishing embargos, private datasets and note making. At the same time, the need for statistical and analytical heuristics to dissect the data, find outliers and visualize distributions was highlighted as a means to enable truly data-driven investigations of datasets. The goal in this is to distinguish anomalies from errors and patterns of corruption from policies.

What’s in it for my readers?

With the data loaded and analyzed, the next question is what value can be added to published articles. Just like DocumentCloud enabled the easy embedding of source documents and excerpts, OpenSpending should allow journalists to visualize distributions of funds, embed search widgets and data links, as well as information about how the data was acquired and cleaned.

What do I need to learn to do it?

Many of those we spoke to were concerned about the complexity required to contribute data. The recurring question was: should I even try myself or hire help? It’s clear that for the platform to be accessible to journalists, a large variety of data cleansing tutorials, examples and tools need to be at their disposal.

We’ve listed the full brainstorm on the OpenSpending wiki

You can also see the mind map with concrete points below:

Hacks & Scrapers - How technical need data journalists be?

In a second session, “Data Camp” we went through the question of how to generate structured data from unstructured sources such as web pages and PDF documents. We tried to emphasize the value of easily machine-readable data over less structured information by pointing to some examples on ScraperWiki.

As we went through basic steps needed to scrape a web page, the questions began turning towards the purpose of the exercise:

“So why do we need to learn how to scrape? Can’t we just hire someone to do this for us?”

Our answer went something like…

“Well, yes - you can actually, but…”

…it may be a good idea to have some understanding of which data can be easily retrieved and what difficulties and errors you might encounter in the extraction process. This includes:

  1. Understanding the possibilities and limitations of various data structures on the web to understand how to approach programmers and what to ask for (and importantly, what it is reasonable to pay).
  2. Understanding how to quality-check data extracted from the internet and where errors could be introduced.
  3. Appreciating that programmers are expensive and that having a basic understanding of some of the principles behind screen scraping yourself could save your organisation quite a lot of money for simpler tasks

The notes from the scraping session are available on this pad

So how do I hire a hacker?

The final thing that became blatantly apparent in sessions such as “Journalist or Programmer? Do Reporters Need to become Coders?” was that there is a huge void that needs to be bridged between the hacker and journalist world. If I had a pound for every time someone at the conference asked me how they could find a hacker, would be mighty happy. We pointed people in the direction of Hacks and Hackers meetings but there is clearly a need for a more extensive ‘address’ book of reliable contacts is obvious.

I will attempt to pull together some of the thoughts we had about how to find (and trust!) your hacker in a separate post to address some of these needs. If you have further advice or anecdotes on this subject, please don’t hesitate to get in contact via the OpenSpending mailing list.