Open Knowledge

Types of data

For the purposes of this manual, we have decided to abstract the meaning of the terms "budget data" and "expenditure data" to fit into a broader context. While these terms may have different meanings on a country by country basis, they are intended to be taken as defined in this section, throughout the rest of the manual. In this section, we look briefly at the two different types of data and what questions can be addressed using them.

Budget data is defined as data relating to the broad funding priorities set forth by a government, often highly aggregated or grouped by goals at a particular agency or ministry. For instance, a government may pass a budget which contains elements such as "Allocate $20 million in funding for clean energy grants" or "Allocate $5 billion for space exploration on Mars". These data are often produced by a parliament or legislature, on an annual or semi-annual basis.

Spending data is defined as data relating to the specific expenditure of funds from the government. This may take the form of a contract, loan, refundable tax credit, pension fund payments, or payments from other retirement assistance programs and government medical insurance programs. In the context of our previous examples, spending data examples might be a $5,000 grant to Johnson's Wind Farm for providing renewable wind energy, or a contract for $750,000 to Boeing to build Mars rover component parts. Spending data is often transactional in nature, specifying a recipient, amount, and funding agency or ministry. Sometimes, when the payments are to individuals or there are privacy concerns, the data are aggregated by geographic location or fiscal year.

The fiscal data of some governments may blur the lines of these definitions, but the aim is to separate the political documents from the raw output of government activity. It will always be an ultimate goal to link these two datasets, and to allow the public to see if the funding priorities set by one part of the government are being carried out by another part, but this is often impractical in larger governments since definitions of programs and goals can be "fuzzy" and vary from year to year.

Budget data

Using the definitions above, budget data is often comprised of two main portions: revenue and taxation data and planned expenditures. Revenue and spending are two sides of the same coin and thus deserve to be jointly considered when budget data is released by a government. Especially since revenue tends to be aggregated to protect the privacy of individual taxpayers, it makes more sense to view it alongside the budget data. It often appears aggregated by income bracket (for personal taxes) or by industrial classification (for corporate taxes) but does not appear at all in spending data. Therefore, budget data ends up being the only source for determining trends and changes in revenue data.

Somewhat non-intuitively, revenue data itself can include expenditures as well. When a particular entity or economic behaviour would normally be taxed but an exception is written into the law, this is often referred to as a tax expenditure. Tax expenditures are often reported separately from the budget, often in different documents or at a different time. This often stems from the fact that they are released by separate bodies, such as executive agencies or ministries that are responsible for taxation, instead of the legislature (

Budgets as datasets

A growing number of governments make their budget expenditure data available as machine-readable spreadsheets. This is the preferred method for many users, as it is accessible and requires few software skills to get started. Other countries release longer reports that discuss budget priorities as a narrative. Some countries do something in between where they release reports that contain tables, but that are published in PDF and other formats from which the data is difficult to extract.

On the revenue side, the picture is considerably bleaker, as many governments are still entrenched in the mindset of releasing revenue estimates as large reports that are mostly narrative with little easily extractable data. Tax expenditure reports often suffer from these same problems.

Still, some areas that relate to government revenue are beginning to be much better documented and databases are beginning to be established. This includes budget support through development aid, for which data is published under the IATI ( and OECD DAC CRS ( schemes. Data about revenues from extractive industries is starting to be covered under the EITI ( with the US and various other regions introducing new rules for mandatory and granular disclosure of extractives revenue. Data regarding loans and debt is fairly scattered, with the World Bank providing a positive example (, while other major lenders (such as the IMF) only report highly aggregated figures. An overview of related data sources can be found at the Public Debt Management Network (

Connecting revenues and spending

It is highly desirable to be able to determine the flow of money from revenues to spending. For the most part, many taxes go into a general fund and many expenditures come out of that general fund, making this comparison moot. But in some cases, in many countries, there are taxes on certain behaviours that are used to fund specific items.

For example, a car registration fee might be used to fund the construction of roads and highways. This would be an example of a user fee, where the main users of the government service are funding it directly. Or you might have a tax on cigarettes and alcohol that funds healthcare grants. In this case, the tax is being used to offset the added healthcare expense of individuals taking part in at-risk activities. Allowing citizens to view what activities are taxed in order to pay for other expenditures makes it possible to see when a particular activity is being cross-subsidized or heavily funded by non-beneficiaries. It can also allow them to see when funds are being diverted or misused. This may not always be practical at the country level, as federal governments tend to make much larger use of the general fund than other local governments. Typically, local governments are more comprehensive with regards to releasing budget data by fund. Having granular, fund-level data is what makes this kind of comparison and oversight possible.

What questions can be answered using budget data?

Budget expenditure data has an array of different applications, but it's prime role is to communicate to it's user broad trends and priorities in government spending. While it can help to have a prose accompaniment, the data itself promotes a more clear-cut interpretation of proposed government spending over political rhetoric. Additionally, it is much easier to communicate budget priorities by economic sector or category than it is at the spending data level. These data also help citizens and CSOs track government spending year over year, provided that the classification of the budget expenditure data stays relatively consistent.

Spending data

For most purposes, spending data can be interpreted as transactional or near-transactional data. Rather than communicating the broad spending priorities of the government like budget data should, spending data is intended to convey specific recipients, geographic locations of spending, more detailed categorization, or even spending by account number.

Spending data is often created at the executive level, as opposed to legislative, and should be more frequently reported than budget data. It can include many different types of expenditures, such as contracts, grants, loan payments, direct payments for income assistance and maintenance, pension payments, employee salaries and benefits, intergovernmental transfers, insurance payments, and more.

Some types of spending data - such as contracts and grants - can be connected to related procurement information (such as the tender documents and contracts) to add more context regarding the individual payments and to get a clearer picture of the goods and services covered under these transactions.

Opening the checkbook

In the past five years, there have been a spate of countries and local governments that have opened up spending data, often referred to as "checkbook level" data. These countries include, but are not limited to, the US (including various state governments), UK, Brazil, India (including some state governments) and many funds of the European Union.

Disclosure thresholds

At least two of these countries have imposed seemingly arbitrary thresholds on the size of transactions that are included. For example, the US and the UK exclude transactions under $25,000 and 25,000 GBP, respectively. Are these thresholds appropriate? That can't be known for sure without more information about how these numbers were arrived at. Principally, having thresholds or exceptions to the reporting of this data depends on the underlying systems that drive disclosure of this data. Are these systems linked directly with the accounting systems already used in the government, easing the burden of disclosure? If so, the threshold for excluding transactions should be very low (setting aside for a moment the cases that require redaction for privacy purposes).

If the systems are mostly divorced, as is the case with the US, then it begs the question, why? The more steps and processes between the internal government accounting systems and the public accounting systems, the higher the chance of error in the data and chance for omission of data. It also undermines the primary goal of public oversight if there are separate systems. However, governments often struggle with IT resources and contracting, which presents a tension between releasing any spending data at all, and release that is consistent with the above principles. If a threshold is necessary, then the amount should be consistent in size and scope with the overall expenditure level for that particular government. It is not appropriate, for example, that the threshold for the US State of Maryland spending reporting is also $25,000, when their annual budget is only a fraction of the federal government's budget.

Release early, release often

Spending data should be released in a relatively timely fashion, at least a monthly or quarterly basis. The timeliness of this data is what allows users to see if the spending priorities in the budget data are being reflected in the spending data. Also it allows the public and government stakeholders to view the current year's spending on a more detailed level as the next year's budget is being decided.

A good example of such release of spending information can be the Indian experience, especially the Employment Guarantee Programme, one of the major National flagship programmes on providing demand-based employment to the rural working age-group population in India. Its Management Information system (MIS) has become the most effective way of getting information on spending on a monthly basis. The data is updated monthly in an accessible spreadsheet format (Excel) at the sub-national government level. This makes the data transparent and available in the public domain to be equally accessed by all. The village level household database has internal checks for ensuring consistency and conformity to normative processes. It includes separate pages for approximately 250,000 local governments at the village level, 6,465 Blocks, 619 Districts and 34 States & Union Territories. The portal places complete transaction level data in the public domain.

However, problems related to maintaining an MIS in every state in a functional form and releasing continuous flow of data have been contentious issues. The major concerns emerge from lack in technical capacity as well as cost related issues. A cumbersome back-end system for supplying the data requires installation of a specific software with prerequisite configurations and technical operators with specific capacities. These requirements have raised costs and put a great demand on technology to ensure a continuous flow of data on the programme, specifically in most interior parts of the country and hence affect timely release of data.

What questions can be answered using spending data?

Spending data can be used in several different areas: oversight and accountability, strategic resource deployment by local governments and charities, and economic research. However, it is first and foremost a primary right of citizens to view detailed information about how their tax dollars are spent. Tracking who gets the money and how it's used is how citizens can detect preferential treatment to certain recipients that may be illegal, or if certain political districts might be getting more than their fair share.

It can also help local governments and charities respond to areas of social need without duplicating federal spending that is already occurring in a certain district or going to a particular organization. Lastly, businesses can see where the government is making infrastructure improvements and investments and use that criteria when selecting future sites of business locations. These are only a few examples of the potential uses of spending data. It's no coincidence that it has ended up in a variety of commercial and non-commercial software products -- it has a real, economic value as well as an intangible value as a societal good and anti-corruption measure.