Common arguments against publishing data

Across the community almost everyone can explain stories about how struggling with government officials for transactional spending data in machine-readable format. Often publishers simply do not know that civil society wants data in a particular format, but there are also deliberate obstructions. In this FAQ we provide a list of the most typical excuses for rejecting to release data in computer-friendly formats.

… in machine-readable format

“PDFs are on my computer - therefore they are machine-readable”

FALSE: The fact they are on your computer means they are electronic copies, but not that they are machine-readable. PDFs are essentially a set of instructions for a printer on how to print a page, they look nice and appealing to the human eye, but to a computer, they are little more than a picture.

PDFs go from bad to worse from the perspective of someone trying to do data work:

  • Better PDFs are machine-generated, typically something like an Excel or Structured Word Documents converted into a PDF (see example). Often, you can copy and paste information from them, but there may be some formatting or issues.
  • Worse PDFs are typically scanned documents. Often, to add to the misery, they will be copies of faxes, smudged, speckled, tea- water- or mould-stained or crooked (sometimes all of the above).
  • Image files are not machine-readable for the same reasons.

“If we publish in machine-readable, open formats - someone will alter the data and use it to discredit us.”

Again, FALSE. If someone wants to use data badly enough, they will use it even if they have to get it out of documents manually. If they have to get it out manually - mistakes could be introduced. Publishing the data in machine-readable format simply allows the user to start working with the data straight away.

Our advice would be the following:

  • Publish both machine-readable and non-machine readable formats. We insist on the former for analysis, but the latter can also be useful e.g. to cross reference numbers and be an easily readable form to read and share reports.
  • Encourage users of the data to show their working. A good data project will usually:
    • Link back to the original source data
    • Link to any modified data with an explanation of how it was changed, with the calculations to any underlying working clearly visible. When you provide such a clear audit trail others will be able to replicate your work and examine transparently that everything was done without errors. In journalism this is sometimes known as the “nerd box”.
    • Offer the data source the chance to comment on calculations from the data in order to clear out misunderstandings.
    • This allows anyone to check the accuracy of the working and verify the results.
    </ul> ## ... in sufficient levels of detail ### “We cannot release spending data as it contains personal information” FALSE, public authorities holding spending data, which includes personal information should not refrain from responsibility of publishing the data. Instead authorities should conduct the proper examination and redact personal data accordingly (workflows can be developed so that this effort is minimal). We see real risks of local and national governments holding back spending data with this excuse and have therefore co-written a guide for public authorities on how to deal with personal information in spending data (see the privacy guide). The current access to data from the EU farm subsidy programme is a clear example of a case where privacy (in this case for farmers) was used as argument to decide a case at the European Court of Justice, which significantly [reduced access to data on farm subsidy payments]( ### “We cannot release spending data due to third parties due to confidentiality concerns” Public authorities should publish information about transactions between them, contractors and commercial vendors. It is not uncommon however that either public officials or commercial contractors will attempt to block releases due to commercial confidentiality of the supplier (the third party). The argument is most commonly argued when requests are made for actual contracts, but even contracts are often [released in full]( without redactions. ### “We cannot release granular data. You can get aggregated expenditures” NOT USEFUL, access to line-by-line transactional spending data is essential in order to ensure accountability. In order to be able to investigate suppliers and procurement practices, detailed transaction-level spending data is required. There are currently a few countries who release such data, the UK, US, Brazil and Slovenia being some of the leaders in this field. While they are leaders, there is still work to do there. We have also noticed a that several countries have introduced fairly high disclosure thresholds in relation to their decision to disclose transactional data. Such practises should be challenged and remain a serious concern, as large shares of public spending can be covered below such disclosure thresholds. Between countries disclosure thresholds vary widely: * United States (federal level): USDollar 25,000 * United Kingdom, National: GBP 25,000 * United Kingdom, Councils: GBP 500 (for spending data), GBP 50,000 (for contracts) * Slovenia: No minimum disclosure threshold * Greece: No minimum disclosure threshold Without knowing more about why these levels have been set across countries, it is hard to fathom why they were so positioned or whether they are reasonable. **Next**: [How to publish spending data without disclosing personal information](./privacyguide/) **Up**: [Appendix](../)