Open Knowledge

Data-driven advocacy and research

We are now in a phase where many governments around the world are proactively publishing documents about what they plan to spend (budgets) and actually spend (spending data). Increasingly, this material is available on the internet, so that anybody can access it at any time. Still, too much of the information is released in the form of 'documents' rather than 'data'. Ideally we need both so that inforamtion can be analyzed, re-used and understood. This chapter is a quick overview of some of the raw inputs required for data-driven advocacy and how it works in practice. 

What do we mean by machine-readable data?

When we speak about data, what we usually refer to is the notion of machine-readable (http://en.wikipedia.org/wiki/Machine-readable_data) data. Many of the formats most commonly used for policy papers and long-form reports published by most policy-making institutions are PDF files, Word documents, web pages or closed interactive infographics - do not structure information in a way that lends itself to automated analysis and extraction.

Such documents are formatted for humans (or printers) to interpret, and it can be hard (and in many cases nearly impossible) for a machine to re-construct the elements in the presentation.

Other formats, such as Excel and CSV files contain a higher level of structured information. For example, in an Excel file you can mark a number of cells and easily calculate their sum. Even more exotic and useful file formats, such as XML documents, JSON APIs or Shapefiles may not have easy-to-use viewer applications. You can think of them as the glue that connects different systems on the web, so that different databases can work together in a seamless fashion.

Why do CSOs need it?

What asking for machine-readable bulk data means for CSOs is simple: you won't have to spend a lot of time manually extracting data from reports into spreadsheets to be able to filter, sort and analyse it - a process which is both time-consuming and can introduce errors.

What to ask for when asking for data: a checklist

In the next section 'Getting Data' - we will deal with asking governments for data (or getting it via other means). To set the scene for this and to work out whether your government actually publishes usable data already, have a quick look at the following questions:

  • Is the government's data published in a machine-readable format? E.g. CSV, XML, JSON. While there is nothing wrong with publishing a PDF to support a data release (in fact it is often nice to have a nicely-laid out document to cross reference and sanity-check data) it shouldn't be the only thing published and if you are asking for a policy document, ask for the underlying data in a spreadsheet so you can check the numbers.
  • Does the government publish a 'data dictionary' to explain the terms used in the dataset? This should include definitions of column headers, explanations of terms and ranges used within the main body of the data, explanations of any changes in terminology which have been introduced since last time the dataset was released
  • How is the data that is being published actually used internally by governments? Do some sanity checks on the minimum and maximum values of different columns to make sure they fall into the documented ranges and don't seem out of place. Do you see negative values when you don't think you should? Negative values usually mean money owed.
  • Is the structure of the data the same across years? If not is there a description of how it changes? It never hurts to contact the publisher and ask questions about the change and why it occurred. The publisher may have their name and contact details on the report or webpage. If there is no named contact then call the department's enquires number or send a message to their email address asking to meet or discuss the data.
  • How aggregated is the data? What is the number of real-world financial transactions that are expressed by a single line of the dataset you have? For budgets this will mostly be hard to tell - but with transactional expenditure you want to make sure that the data is fairly disaggregated. Ideally, each entry represents a transaction - but even if this isn't true you'll still want to ensure the number is not in the tens or hundreds of thousands (e.g. government programmes as a whole).
  • Ask for reference data. If your budget or spending data is augmented with reference data, make sure you have access to it. This might include functional or category codes on budget line items, location codes for describing recipient location, or codes that indicate the status of the record. 
  • Ask also for the guidelines people were given when creating the dataset. This will make it easier to understand what is included within the data, e.g. are the numbers in thousands / millions. 
  • Final tip: if the data you want is not given then narrow your scope. Your chances of success will be higher if you narrow the scope of the data you're requesting from the government and you are specific. Government is the de facto keeper of all kinds of data, so parameters that narrow your request are always helpful.

An introduction to data-driven advocacy

Is going out and provoking a riot the best way to get a Government to take onboard your message? There are alternatives: hit them with the data hammer instead!

Making evidence-based policy proposals consists from three major phases: formulating your assumption, analysis (which often leads to re-formulating your assumption, and presenting your data in an engaging way in a policy proposal.

Analysing assumptions

Asking the right question is key to getting the most out of your data. We all make assumptions, and our organisation may have a particular standpoint on a given issue. Our first task is always to formulate our assumptions and then interrogate them ferociously. Although we try to be rational in this process, our judgement is often influenced by our subjective goals, values, and beliefs. Sometimes, you'll need to revisit your assumptions several times over to ensure they are valid and you can back them up with data. Once you know your policy problem is definitely a problem, you can work to package it in a way that's appropriate for your target audience. 

What is public interest? 

Often our job is to act in the public interest by analysing conflicting assumptions and working out which one is more valid. For example, in Greece, Spain, and many other European countries people protest almost everyday as the Government cuts spending to bring down its budget deficit. If the Goverment wanted to keep its current level of spending, but increased taxes to increase its revenue, different citizens groups would still protest depending on which taxes are to be increased. In any case, there will always be more than one interpretation of any Government policy, and interested side to support it, or not.

Policy analysis

Once we have a well defined policy problem, specific goals, or results different stakeholders are trying to achieve, and corresponding instruments they are using in this process, we may systematically search for the specific data needed to create our own policy proposals. This data can be obtained either from the Government, some other sources e.g. academic journals, private companies, or generated by ourselves. When data is gathered we will use a specific methodology to analyze it, and based on this analysis we will approve or reject our assumptions. If the assumption is rejected, based on our findings we will have to make the new assumption, and start the process from the beginning. If our assumption is approved, we will use our results to make a policy proposal to the Government.

Policy proposals

For CSOs it is important to recognize who is a decision maker, hence, who you should be targeting with your policy proposal. Policy proposals should be methodologically well structured, evidence-based, open for debate, and scientificaly evaluated. Governments will seldom take our policy proposals as their own policymaking, but may actually change its course of action, get new insights, views, and understanding of the subject. We may also use policy briefs to approach Government officials, or press releases to get the attention of the public. 

Case study 

Fish subsidies

The influence CSOs have on government policy comes from a wide and varied set of activities. These can range from producing a widely shared dataset or infographic which subtly influences the mood of policy makers, to more targeted CSO advocacy and lobbying on issues they are experts.

The Fish Subsidies group (http://fishsubsidy.org) are a nice example of a CSO engaged in targeted activites. Having collected a comprehensive set of data on Fishing subsidies paid under the European Union’s common fisheries policy and they break this down into payments for every EU member state, and then complemented this with activites of fishing. They have produced a report (http://is.gd/XYPgq5) assessing the environmental and social impacts of the Financial Instrument for Fisheries Guidance between 2000 and 2006. This extensive document fed directly into the EU political decision making process.