India Spend is one of India’s first data journalism initiatives. Starting out with a tight remit to investigate spending practices in India from a journalistic standpoint, they have since branched out into other topics, such as the urbanisation of India, many of which have financial themes. Their reports are very well regarded, and other business newspapers pay a monthly fee for syndication of their reports.
India Spend mentioned a variety of issues in getting, working with, and presenting financial data in India. Here are a few of the most striking.
Problem number 1
“We have to start sourcing physical copies of the data, and the problem often is that paper copies are in local languages, which we don’t speak.”
To date, to our knowledge, there is no simple method of automatically machine-translating datasets, the most effective method of machine translation being Google Translate, which has a fee-based API, and even this does not cover all of India’s languages.
Problem number 2
“The average government website in India isn’t even PDFs, it’s images.”
See the Tools Ecosystem Section for a few tools to help extract information from image-based documents.
Problem number 3
“The level of literacy for visualisations in India is not high”.
People struggle to interpret anything besides the simplest charts, so the India Spend team try to keep it simple. They have been experimenting with simple visualisation tools such as Tableau and GeoCommons, but there have been some complications. When trying to map locations in India, for example, they often found that the given longitude and latitude of a particular place were recorded incorrectly. This was not so much the case when they tried to do mapping internationally—mainly just India.
A few conclusions
Common trends in the types of data required:
- To advance their work, India Spend really need performance and program data, but this simply is not available in India.
- Output-level data is needed to be able to compare what was promised against what actually happened.
- Information on the original instructions given to people compiling the data within governments is needed in order to understand what assumptions were made and what is and is not included in a given category.
See the full list of organisations we visited on the India trip on the OKFN-India blog.
Next: Supervizor, Slovenia