Public Sector Information, Open Data and Data Revolution


Posted

in

For a while, I have been planning on writing about the post-2015 High-Level Panel (HLP) report and the role of ICTs in the new development agenda. As I see it, the report has missed an ample opportunity here (as the old MDGs did!) by focusing primarily on one aspect of the new technologies’ potential to enhance human development. While in the 2000s the focus was on access to ICTs (and thus we got target 18 as part of MDG 8), we have now shifted to data, specifically digital data. 

While considering the arguments, I realized that further research and reflection were needed. First, I needed to fully understand what we mean by “data” and “data revolution” to link this to the context of developing countries. This led me to think about the broader concept of public sector information—not just data.

So here are some thoughts on all of this.

DATA 
Simplifying things a bit for the sake of argument, we can define three distinct types of data based on how they are produced.

The private sector has always generated massive amounts of data for many reasons, one being the need to accurately read the market and dynamically adjust supply (and product creation) to meet consumers’ specific demands. Real-time data is of particular interest here. At any rate, data produced in this fashion is private and accessing it, if available to consumers, usually requires financial payment. In this sense, data here is just an average commodity like any other.

 

Data-Evolution-UNDPGovernments also produce substantial amounts of data. Think, for example, of civil and electoral registries, health and education data, etc. But public data is financed by taxes and other public resources and thus is, well, public. Therefore, it is a public good (non-rivalrous and non-exclusive) that should, in principle, be available to all stakeholders and citizens. Public data is thus not a commodity, unlike private data. And not all data produced by the public sector needs to be available to the public for various reasons.

Finally, we have personal data, or all the data (and information) that pertains to any natural person’s identity. This data is, so to speak, part of our personal DNA and has a direct relation to privacy. The latter is a fundamental human right acknowledged by the UN Declaration on Human Rights and the  International Covenant on Civil and Political Rights.

PUBLIC SECTOR INFORMATION (PSI)
For the sake of argument, I will briefly use the so-called DIKW pyramid (we can surely drop the wisdom component!) to make a logical distinction between data and information. In this context, we can say that information is data that has been processed and structured in a way that is useful for people to understand. Taking this at face value, we immediately realize that governments need to have information (not just data) to function effectively, issue policies and regulations and manage a country’s overall state of affairs.

We also need to point out that not all public sector information comes from government-collected data or is based on actual data. For example, OECD’s definition of PSI includes products, services, and data (see http://www.oecd.org/internet/ieconomy/40826024.pdf). The EU uses this definition and calls for PSI’s reuse within and across Union governments.

Based on the above, we can then say that public data is a subset of PSI. And open data can, at best, be as large as public data if we assume that all data created and collected by public institutions can, in fact, be opened. This might not be the case if national security considerations are factored in, for example. Remember that private data purchased by the public sector for public use remains private—unless licensing agreements specify that it may be shared and disseminated royalty-free.

OPEN DATA
We can now say that open data refers to public data as a subset of PSI. We have already noted that governments also process data to create information, which should be open as well. So perhaps we can enrich the concept of open data by including public information. Or we can suggest the term open information, encompassing both public data and public information. This distinction is essential if we think, for example, of prioritizing which public information sets should be made available first.

Suppose the public sector already invests public resources in processing data and generating information. In that case, making this information available, along with the data used to create it, makes perfect sense. If private contractors are hired to process public data and contracts are unclear about ownership of the generated information, it is feasible that the information could become private and no longer be readily available to the public.

Freedom of Information Acts (FOIAs) and legislation actually targeted public information, and usually ignored data. Today, over 90 countries have already passed FOIAs. Most FOIAs exclude private information from their purview, so if public information is being privatized, there is no exact way to address this issue via FOIAs.

Finally, many countries are updating FOIAs to include digital information and data—although I cannot imagine the average citizen requesting a FOIA to access data per se. Adding digital information is critical, as some governments argue that FOIAs only apply to information on paper, not in digital format.

Now, quickly glancing at how the Internet and other news technologies have impacted this, we can say there is a clear tendency to merge the three data types into a single, more significant set (see animation below). Think, for example, of big data combined with the growing public nature of personal data, especially with the advent of social networks. Personal data is being privatized on an unprecedented scale and sold like any other commodity. The same can be said of chunks of public information that are somehow bought by the private sector and are thus only available (if at all) at a cost.

DATA, DEVELOPMENT AND REVOLUTION

The lack of reliable, official statistics to measure MDG targets shows that many developing countries do not yet have the resources, capacity and/or political will to generate data and information relevant to national and international development agendas. MDG Acceleration Framework (MAF) reports conducted in the past 3 years have provided additional data on the MDGs. However, data gaps remain considerable after a quick review of many of them.

There appears to be a clear need for a “data revolution,” as the post-2015 HLP suggests. But what the HLP has in mind goes beyond measuring and monitoring development progress. The panel also calls for a “new data revolution” for accountability and decision-making processes, capturing citizens’ demands, reaching the neediest, assessing public service delivery, providing open access and supporting statistical systems (see HLP report pgs. 23-24). Note that the HLP does not refer to big data and uses “open data” as two separate words—and not as the concept we discussed in the previous section. Finally, the HLP also calls for a “Global Partnership on Development Data” that should include all stakeholders, sectors, and interested parties.

Going back to the issues raised in the previous section, it is possible to argue that what we are really talking about is a new information revolution. The difference with the one in the 1990s is that many more people, millions if not billions, have access to information and communication channels and can thus access information and provide information interactively and in real time. After all, this is the major difference between the new technologies and radio or TV, etc.

I am not too sure I understand the concept of development data. Most PSI is relevant to the development of poor countries (LDCs, LICs and many LMICs) and is indeed a requirement for integrated, evidence-based policy decisions. This is certainly not the case with countries in the upper brackets of development or income. There is no single definition of development data in this light, as it can vary across different contexts. It is probably better to stick to PSI relevant to development while fostering participation, transparency and accountability.

Cheers, Raúl

wpChatIcon
wpChatIcon