I have been planning for a while on writing about the post-2015 High Level Panel (HLP) report and the role of ICTs in a new development agenda. As I see it, the report has missed a big opportunity here (as the old MDGs did!) by focusing mostly in one aspect of the potential of the new technologies in enhancing human development. While in the 2000s the focus was access to ICTs (and thus we got target 18 as part of MDG 8), now we have shifted to data, as in digital data.
While thinking about the arguments I realized that further research and thinking was needed. First of all, I needed to fully understand what we mean by “data” and “data revolution” to then link this to the context of developing countries. This led me to think on the broader concept of public sector information – not just data.
So here are some thoughts on all of this.
Simplifying things a bit for the sake of argument, we can define three distinct types of data based on the way they are produced.
The private sector has always produced massive amounts of data for many reasons, one being the requirement to properly read market to dynamically adjust supply (and product creation) to cater to specific demands from consumers. Real time data is of particular interest here. At any rate, data produced is this fashion is private and accessing, if available to consumers, usually requires financial payment. In this sense, data here is just an average commodity, just like any other.
Governments also produce quite a bit of data. Think for example of civil and electoral registries, health and education data, etc. But public data is financed by taxes and other public resources and thus is, well, public. It is thus a public good (non-rivalrous and non-exclusive) that should in principle be available to all stakeholders and citizens. Public data is thus not a commodity, in contrast to private data. And not all data produced by the public sector needs to be available to the public for a wide variety of reasons.
Finally we have personal data or all the data (and information) that pertains to the identity of any natural person. This data is, so to speak, part of our personal DNA and has a direct relation to privacy. The latter in turn is a fundamental human right as acknowledged by the UN Declaration on Human Rights and the International Convenant on Civic and Political Rights.
PUBLIC SECTOR INFORMATION (PSI)
For the sake of argument too, I will make brief use of the so-called DIKW pyramid (we can surely drop the Wisdom component!) to make a logical distinction between data an information. In this context, we can say that information is data that is processed and structured in some fashion that is useful for people to understand. Taking this at face value, we immediately realize that governments need to have information (not just data) to be able to function effectively, issue policies and regulations and manage the overall state of affairs of a country.
We also need to point out that not all public sector information comes from data collected by the government or is based on actual data for that matter. For example, OECD’s definition of PSI includes products and services, in addition to data (see http://www.oecd.org/internet/ieconomy/40826024.pdf). The EU uses this definition and calls for the reuse of PSI both within and across governments in the Union.
Based on the above, we can thne say that public data is a subset of PSI. And open data can at best be as large public data, if we assume that all data created and collected by public institutions can be in fact be opened. This might not be the case if national security considerations are factored in, for example. Bear in mind that private data purchased by the public sector for public use remains private -unless licensing agreements are spelled out to allow for its public, free-royalty sharing and dissemination.
From the above, we can now say that open data refers to public data as a subset of PSI. We have already noted that governments also process data to create information and such information should also be open. So perhaps we can enrich the concept of open data by including public information in it. Or we can instead suggest the term open information which will include both public data and public information. This distinction is important if we are thinking for example of prioritizing which public information sets should be first made available.
If the public sector is already investing public resources to process data and generate information then it makes perfect sense to make this information also available, in addition to the data that was used to create it. If private contractors are hired to processpublic data and contracts are not clear about the ownership of the information being generated, it is then feasible that such information can in fact become private and will not be readily available to the public.
Freedom of information acts (FOIAs) and legislation actually targeted public information -and usually ignored data. Today, over 90 countries have already passed FOIAs. Most if not all FOIAs exclude private information from their purview so if public information is being privatized there is not clear way to address this issue via FOIAs.
Finally, many countries are updating FOIAs to specifically include both digital information and data -although I cannot imagine the average citizen putting a FOIA request to access data per se. The addition of digital information is critical as some governments can argue that FOIAs only apply to information on paper and not on a computer, in digital format. So once the information is digitized and the paper original “misplaced” we might have somem issues trying to access it.
Now, quickly glancing at the way the Internet and other news technologies have impacted this, we can say that there is a clear tendency to mesh the three different data types into one single and bigger set (see animation below). Think for example of so-called big data, combined with the way in which personal data has become much more public, especially with the advent of social networks. Nowadays, personal data is being privatized in a unprecedented scale and is sold as any other commodity. The same can be said about chunks of public information which is somehow bought by the private sector and is thus only available (if at all available) at a cost.
DATA, DEVELOPMENT AND REVOLUTION
The lack of reliable and official statistics for measuring MDG targets is the living proof that many developing countries do not yet have the resources, capacity and/or political will to generate data and information relevant to national and international development agendas. MDG Acceleration Framework (MAF) reports conducted in the last 3 years have provided additional data on the MDGs but, after a quick review of many of them, it is that data gaps are still large.
In any event, there seems to be clear need for a “data revolution” as suggested by the post-2015 HLP. But what the HLP has in mind goes well beyond measuring and monitoring development progress. The panel also includes in its call for a “new data revolution” for accountability and decision-making processes, capturing citizens demands, reaching the neediest, assessing public service delivery, providing open access and supporting statistical systems (see HLP report pgs. 23-24). Note that the HLP does not refer to big data at all and uses “open data” as two separate words – and not as the concept we discussed in the previous section. Finally, the HLP also calls for a “Global Partnership on Development Data” which should include all stakeholders and sectors, as well as all interested parties.
Going back to the issues raised in the previous section, it is possible to argue that what we are really talking about is a new information revolution. The difference with the one that took place in the 1990s is that today many more people, millions if not billions, have access to information and communication channels and can thus not only access information but also provide information interactively and in real time. After all this is the central difference between the new technologies and say radio or TV, etc.
I am not too sure I understand the concept of development data. For poor countries (LDCs, LICs and many LMICs) most PSI is relevant for development and is indeed a requirement to make integrated and evidence based policy decisions. This is certainly not the case with countries in the upper brackets of development or income. In this light, there is no single definition of development data as it can vary across different contexts. It is probably better to stick then to PSI that is relevant to development, while also fostering participation, transparency and accountability in the process.