You are browsing the archive for spending stories.

Wearing your examples

Tryggvi Björgvinsson - October 7, 2013 in Spending Stories

Spending t-shirts at Googleplex

OpenSpending and Journalism++ are collaborating on a project we call How much is it really? The goal of the project is to create a webapp where users can type in a number to rough equivalents.

The idea is that a reader of the news might be reading about how much it cost to built an airport. It might say it cost millions of dollars. The problem is that most news readers don’t have millions of dollars nor will they ever have. They can’t even begin to understand how much it is (but they do know that it’s probably a hefty sum). This news reader could then quickly go and type in the cost mentioned in the news article and see that this cost half of what another airport cost or double what the government spent on education.

We hope that this webapp will help readers of the news, journalists or just about anybody better understand the numbers or start questioning where money is going. Why does it cost so much to build airports? Why not settle for something less expensive and improve education instead?

The airports and education example is fictional. I don’t know how much it costs to build airports and I don’t have in my head the amounts governments spend on education. But an example like this is the best way to explain the idea behind the webapp.

Now OpenSpending and Journalism++ were invited to a meeting in Palo Alto by the funder of the project, The Knight Foundation (the webapp is a subproject of a bigger project called Spending Stories). At the meeting we wanted to quickly give people an idea of what the webapp was all about.

Instead of giving people a fictional example of airports and education we decided to use real examples and link them to an amount people understand and if there is an amount hackers and activists understand then that amount would be the price of t-shirts with witty things printed on them. So we decided to print t-shirts with real examples on them.

But what to put on those t-shirts? There were three of us going to Palo Alto and each of us from a different country: Denmark (Anders), France (Anne-Lise), and Iceland (me). That’s a great opportunity to give examples from all over te world (OK, not all over the world since we’re all from Europe). We just needed to look at data we had for each of those countries in OpenSpending. Then we had to write those examples down and create some pretty images (or just use the ones we have in OpenSpendingJS since they’re awesome).

Danish version

For Denmark we could look at all of the different municipalities and their spending. Made even easier by the “Kend dine kommune” project by the Politiken newspaper. After looking around tax revenue paid every year by an average inhabitant amounted to about 365 t-shirts. First example down! The resulting images and text:

Danish t-shirt version

Now here’s one thing about that t-shirt. The image of the Little Mermaid (a monument for Copenhagen) is taken from flaticon and appears to be licensed under Creative Commons Attribution and the designer is said to be Freepik.

The problem with that is that “Freepik is a search engine that helps graphic and web designers to locate high quality photos, vectors, illustrations and PSD files for their creative projects.” It’s not a designer. So I didn’t want to attribute this incorrectly on a printed t-shirt (I probably should have just gone with a silhouette of Denmark but it’s too late now). Anyways. See this as an attribution to Freepik but also me asking who the real designer is since I want to attribute the designer and not the search engine (and reimburse that designer for not attributing on the t-shirt).

French version

France was slightly more difficult than Denmark. There was no good data in OpenSpending — look at this as a call for action if you’re from France! However Anne-Lise pointed out a recent scandal that had popped up recently surrounding a charity website by Carla Bruni-Sarkozy. The cost of this charity website was about €410,000 which is way more than you’d need to create a WordPress site. What’s worse is that she paid for this site while she was France’s first lady… and she paid for it with public money!

Nicolas Bousquet (a web designer quoted in the news article) said anyone could have built this site for less than €10,000. With those amounts in hand we could mix them up with the estimated number of homeless children in France:

French t-shirt version

Icelandic version

Icelandic version was simpler since OpenSpending had data on how much was spent my ministries in Iceland in 2012. These were probably the simplest calculations but the most interesting amount is probably how much the government spends on the welfare system since that’s where it spends the most money (Nordic welfare system ftw!) even though our national hospital is almost bankrupt. Anyways looking at those numbers and how much Icelanders spent on average every second in the welfare system this was the result:

Icelandic t-shirt version

Final t-shirts

So how did the t-shirts come out. Well, me being the perfectionist I am not as I had hoped but they were still awesome. We were going to have colour versions with better prints but when the t-shirt printer called to let me know they wouldn’t be able to print this because of details in the pictures I had to improvise a little. Since the graphics were all based on a specific price I had to go with normal prints on white t-shirts of better quality than the coloured t-shirts (got those for the same price). Nonetheless, the resulting t-shirts are still awesome:

Printed t-shirts

Bonus

It was really fun finding these examples and there were a lot of different ideas that came up. But since I mentioned airports in the initial fictional example I thought I’d throw in a t-shirt version that’s connected to airports. It’s an alternative French version (the looks aren’t the same since this was scrapped before fixing fonts and adding logos):

Alternative French version

Sevilla Presus: Data-driven journalism at municipal level

J. Félix Ontañón - October 1, 2013 in Data Journalism, Spending Stories

A such a disruptive technology as Internet, is forcing us to re-think the role and methods of many professions. Old models hasn’t died yet, but in the new ones we can find some common patterns for success: empowering people through a community for cooperation. Journalism isn’t the exception. Through the The Guardian Data Blog, as an example, many citizens helped to transcribe and find stories in the MP’s expenses data: people give their eyes, The Guardian gives the platform.

Sevilla Actualidad and Sevilla Report, both local newspapers in Sevilla (Spain), are using two OKFN tools empower their fellow citizens in the spirit of The Guardian Data Blog. With the help of OpenKratio, a group of citizens for fostering the Open Government Data culture in Spain, we’ve launched #SevillaPresus13, a Crowdcrafting app to crowdsource the transcription of municipal budgets.

#SevillaPresus13

We aim to complete a set of visualizations on OpenSpending for the 2011-2013 municipal budget series (2012 was done), so it would be available for everyone to embed them in their own web posts. The plan is to link the budgetary information with the municipal public procurement. This way both local newspaper will have a powerfull tool to monitorize the municipal activity and finding insteresting stories to tell. This project has been accepted into a data-journalism contest in Madrid (Spain).

Crowdcrafting is an amazing platform to build crowdsourcing apps for transcribing document and images into machine-readable data. As it provides some out-of-the-box pdf transcribe apps, all you need to do is download, customize and deploy for your own proposes. In the case of PDF files, tools as Tabula are improving the way non-techie people can unlock the information, but only Crowdcrafting is able to develop an engaging crowd-experience for users.

D3.js Sankey diagrams with the OpenSpending API

J. Félix Ontañón - August 28, 2013 in Spending Stories

This post is cross-posted from the PBS Idea Lab Blog.

OpenSpending has a built-in set of visualizations – bubble charts, treemaps, and tables – which are useful for exploring how data is structured in levels. None of them, however, are really suitable for representing spending flows.

Fortunately, users of the D3.js data visualization library have given us many examples of visualizations suitable for that purpose. The purpose of this tutorial is to show how easily D3.js can be used to visualize spending flows with OpenSpending data.

Introducing D3.js and Sankey diagrams

D3.js is a JavaScript library that creates data-driven documents (hence D3). Data visualizations are constructed with D3 by specifying a meaningful relationship between data and graphical elements. No manual fiddling with lines and boxes is required.

D3.js has a huge and active community of users, and they have built a set of example visualizations. Some of these are incredibly useful for catching the eye with money flows: Sankey diagrams, chord diagrams (or circular networks), and map networks.

OS & D3: Sankey diagram OS & D3: Chord diagram OS & D3: Network map
Energy and consumption Sankey diagram Uber Rides by Neighborhood Chord diagram Flows of refugees Map network

All of these examples are fully reusable: all you need to do to use them is to replace their underlying data with your own.

In the following example, we will focus on Sankey diagrams, as they can represent more than two levels of flow. Sankey diagrams:

are typically used to visualize energy or material or cost transfers between processes. [...] They are helpful in locating dominant contributions to an overall flow. (Sankey diagram article on Wikipedia)

The Aggregate API

To get spending data into D3.js, we can use the OpenSpending API, which gives us spending data in a form that can easily be translated into something D3.js understands.

The key API for producing spending data visualizations is the aggregate API, which groups together entries in the dataset, sums up their values, and returns the result as a JSON object.

An aggregate API call looks like this, where “ is the ID of an OpenSpending dataset:

GET /api/2/aggregate?dataset=

If no other parameters are included, all entries in the dataset are put in a single group, and the values of every entry are summed together.

Things get more interesting when we add a drilldown parameter. This specifies a dimension of the data which will be used to split the set of entries. Each possible value of the specified dimension becomes a group of entries with its own subtotal.

Let’s drill down on the programa dimension of the ugr-spending dataset, for example, and look at the shape of the output:

GET /api/2/aggregate?dataset=ugr-spending&drilldown=programa

{
  "drilldown": [
    {
     "amount": 283175993.0, 
     "num_entries": 54, 
     "programa": {
       "taxonomy": "programa", 
       "html_url": "http://openspending.org/ugr-spending/programa/422d", 
       "id": 1, 
       "name": "422d", 
       "label": "Ense\u00f1anzas Universitarias"
     }
   }, 
   {
     "amount": 64294001.0, 
     "num_entries": 52, 
     "programa": {
       "taxonomy": "programa", 
       "html_url": "http://openspending.org/ugr-spending/programa/321b", 
       "id": 2, 
       "name": "321b", 
       "label": "Estructura y Gesti\u00f3n Universitaria"
      }
    }, 
    {
      "amount": 47967613.0, 
      "num_entries": 27, 
      "programa": {
        "taxonomy": "programa", 
        "html_url": "http://openspending.org/ugr-spending/programa/541a", 
        "id": 3, 
        "name": "541a", 
        "label": "Investigaci\u00f3n Cient\u00edfica"
     }
   }
 ], 
 "summary": {
   "num_drilldowns": 3, 
   "pagesize": 10000, 
   "cached": true, 
   "amount": 395437607.0, 
   "pages": 1, 
   "currency": {
     "amount": "EUR"
   }, 
   "num_entries": 133, 
   "cache_key": "a3b56dc06b8a869ffa49b0ff063562798b073a3a", 
   "page": 1
 }
}

The aggregate API returns an object with two fields, drilldown and summary. The latter contains information about the dataset, and the former is a list of different values of the drilled-down dimension and the sum of the spending values of all dataset entries with that value of the dimension. Each different value is an item in in drilldown, and its sum is its "amount".

We can also split the dataset by combinations of dimensions. This API call gives us a subtotal for each combination of programa and to:

GET /api/2/aggregate?dataset=ugr-spending&drilldown=programa|to

Using the aggregate API to construct D3.js visualizations means writing code to traverse the JSON objects returned by the API and to translate their contents into the form D3.js expects.

Building a Sankey diagram

Time for the full exercise! We will build a D3.js Sankey diagram from OpenSpending API, in the following way:

  • Materials: 2013 income and spending budgets for the University of Granada (UGR) at Spain. These datasets are titled [ugr-income](http://openspending.org/ugr-income) and [ugr-spending](http://openspending.org/) on OpenSpending.
  • Methods: An R script that gets data from OpenSpending API and transforms it into a D3.js Sankey diagram JSON input file format.
  • Results: A presentation page embedding the Sankey diagram, OpenSpending treemaps, and raw data.

The first step is to determine what we want to show in the Sankey diagram. Which relations should be displayed? How many levels of flow are appropriate for a suitable reading of the data? What’s the story that you want to tell?

Relying on the UGR income and spending budgets, we can imagine money flowing from the sources of income to the University and then the University spending this money. Attending to the budgetary structure, we finally choose a three-level Sankey diagram:

  • Level 1: Income budget broken down as “articulo” (economic classification) targeting to “Universidad de Granada”.
  • Level 2: “Universidad de Granada” targeting the spending budget broken down into “programas de gasto” (functional classification).
  • Level 3: “Programas de gasto” broken down into “capítulos de gasto” (economic classification).

Notice that since the total amounts of the income and spending budgets are equal, both sides of the Sankey diagram have the same size.

The second step is being able to get the data. As we explained above, OpenSpending has an API that allows us to retrieve data aggregated by measures and drilled down by dimensions.

Getting the JSON data for the three levels of our Sankey diagram is as easy as follows:

GET http://openspending.org/api/2/aggregate?dataset=ugr-income&drilldown=articulo
GET http://openspending.org/api/2/aggregate?dataset=ugr-spending&drilldown=programa
GET http://openspending.org/api/2/aggregate?dataset=ugr-spending&drilldown=programa|to

This is a partial return for the second call. Notice that the data needed for the Sankey diagram are “labels”, “amounts”, and links between nodes.

{ 
  "drilldown": [ 
    { 
      "amount": 283175993.0, 
      "num_entries": 54, 
      "programa": { 
        "taxonomy": "programa", 
        "html_url": "http://openspending.org/ugr-spending/programa/422d", 
        "id": 1, 
        "name": "422d", 
        "label": "Enseñanzas Universitarias" 
      } 
    }, 
  /* Two more drilldown entries here. */   
  ], 
  "summary": { 
    "num_drilldowns": 3, 
    "pagesize": 10000, 
    "cached": true, 
    "amount": 395437607.0, 
    "pages": 1, 
    "currency": { 
      "amount": "EUR" 
    }, 
    "num_entries": 133, 
    "cache_key": "a3b56dc06b8a869ffa49b0ff063562798b073a3a", 
    "page": 1 
  } 
}

The third step is to produce the JSON input file format for the D3.js Sankey diagram. It has two components: links and nodes. Nodes are joined with links (i.e. arrows with variable width) and are represented as an array of labels, while the links component refers to an array with three members: source node index, target node index, and value (in this example, amount of money). The indexes in the links component refer to the position of each node at the node’s component. Check the final JSON input file for this UGR example for further details.

So the data for Level 1 has income “articulo” labels as source, a hardcoded “Universidad de Granada” label for target, and amounts as value. Level 2 starts with a “Universidad de Granada” hardcoded label as source, spending “programa” labels as target, and amounts as value. For Level 3, we have spending “programa” labels as source, spending “chapter” labels as target, and amounts as value. The provided R script automates the process of retrieving the data and transforming it into a Sankey diagram JSON input file. The code’s comments clarify how it works.

The fourth and final step is to create a web page to show the Sankey diagram. Fortunately, with a well formatted JSON input file, the official D3.js Sankey diagram example is fully reusable. We simply replace the JSON file with our own and enjoy the results. Some CSS and JavaScript variables can be tuned for controlling the colour palette or the width of the diagram—just check out the D3.js documentation.

Conclusion

We’ve shown how easy it is to take advantage of the aggregation methods of OpenSpending’s API to extend OpenSpending’s default set of visualizations. D3.js is a powerful toolkit that gives us a better comprehension of budgetary data. An out-of-the-box D3.js visualization using OpenSpending as a data warehouse would provide a nifty boost to the OpenSpending project. In the meantime, take a look at Michael Bauer’s openspending-sankey, which makes it rather easy to create D3.js Sankey diagrams for virtually every OpenSpending dataset.

How Spending Stories Fact Checks Big Brother, the Wiretappers’ Ball

Lucy Chambers - February 24, 2012 in Data Journalism, Spending Stories

This piece was co-written with Eric King of Privacy International and comes as Privacy International launches a huge new data release about companies selling surveillance technologies. It is cross-posted on the MediaShift PBS IDEA LAB

Today, the global surveillance industry is estimated at around $5 billion a year. But which companies are selling? Which governments are buying? And why should we care?

We show how the OpenSpending platform can be used to speed up fact checking, showing which of these companies have government contracts, and, most interestingly, with which departments…

The Background

Big Brother is now indisputably big business, yet until recently the international trade in surveillance technologies remained largely under the radar of regulators and civil society. Buyers and suppliers meet, mingle and transact at secretive trade conferences around the world, and the details of their dealings are often shielded from public scrutiny by the ubiquitous defence of ‘national security’. Perhaps unsurprisingly, this environment has bred a widespread disregard for ethics and a culture in which the single-minded pursuit of profit is commonplace.

For years, European and American companies have been quietly selling surveillance equipment and software to dictatorships across the Middle East and North Africa – products that have allowed these regimes to maintain a stranglehold over free expression, smother the flames of political dissent and target individuals for arrest, torture and execution.

They include devices that intercept mobile phone calls and text messages in real time on a mass scale, malware and spyware that gives the purchaser complete control over a target’s computer and trojans that allow the camera and microphone on a laptop or mobile phone to be remotely switched on and operated. These technologies are also being bought by Western law enforcement, including small police departments in which the ability of officers to understand the legal parameters, levels of accuracy and limits of acceptability is highly questionable.

The data that has just been released on the Privacy International Website included the following:

  1. An updated list of companies selling surveillance technology, and
  2. Naming all the government agencies attending an international surveillance trade show known as the wiretappers ball.

Some names are predictable enough: the FBI, the US Drug Enforcement Administration, the UK Serious Organized Crime Agency and Interpol, for example. The presence of others is deeply disturbing: the national security agencies of Bahrain and Yemen, the embassies of Belarus and the Democratic Republic of Congo and the Kenyan intelligence agency, to name but a few. A few are downright baffling, like the US department of Commerce or the US Fish & Wildlife Service and Clark County School District Police Department.

Now, with the aid of OpenSpending, anyone can cross reference which contracts these companies hold with governments around the world. The investigation continues…

Using OpenSpending to speed up fact-checking

Privacy International approached the Spending Stories team to ask for a search widget to be able to search across all of the government spending datasets for contracts held between governments and these companies (until this point, it had only been possible to search one database at a time).

The Spending Browser is now live at http://opendatalabs.org/spendbrowser. And, as the URLs correspond to the queries, individual searches can be passed on for further examination and, importantly, embedded in articles directly. Try it yourself against the list of companies listed in the Surveillance Section of the Privacy International Site (Just enter a company e.g. ‘Endace Accelerated’ into the search bar).

The Spending Browser will become increasingly more powerful as ever more data is loaded into the system.

Want to help make this tool even more powerful? Get involved and help to build up the data bank.

Coverage

You can read more about the background to these stories on the Privacy International Site and recent coverage by the International Media: