Data Revolution, Open Data and Public Sector Information

I have been planning to write a short blog post on the HLP report and the role of ICTs in the post-2015 agenda. As I see it, the report has missed a big opportunity (as the old MDGs did!) by focusing only on one aspect of the potential of new technologies to enhance human development. While in the 2000s the focus was on access to ICTs (and thus target 18), we have now shifted to data, specifically digital data.

While considering the arguments, I realized that further research and reflection were needed. Firstly, I needed to fully understand what we mean by “data” and “data revolution” and then link this to the context of developing countries. This led me to think about the broader concept of public sector information—not just data.

So here are some thoughts on all of this.

Data

Simplifying things a bit for the sake of argument, we can define three distinct types of data based on the way they are produced. The private sector has always generated massive amounts of data for many reasons, one key goal being the need to properly read the markets and dynamically adjust supply to meet specific consumer demands. Be that as it may, data produced in this fashion is private and access is usually not open as data must be purchased—if available for sale. In this sense, data here is a commodity just like any other.

Governments also produce substantial amounts of data. Think, for example, of civil and electoral registries, health and education data, etc. But public data is financed by taxes and other public resources and thus is, well, public. It is thus a public good (non-rival and non-exclusive) and should be available to all stakeholders and citizens. Public data is thus not a commodity and should, in principle, be available to all citizens and stakeholders.

Finally, we have personal data or all the data (and information) that pertains to the identity of any natural person. This data is part of our personal DNA and directly related to privacy. The latter is a fundamental human right as acknowledged by the UN Declaration on Human Rights and the International Covenant on Civil and Political Rights.

Now, quickly glancing at how the Internet and other news technologies have impacted this, we can say there is a clear tendency to merge the three data types into a single, larger set. Think, for example, of the so-called big data, combined with the way in which personal data has become much more public, especially with the advent of social networks. Nowadays, personal data is being privatized on an unprecedented scale and is sold as any other commodity. The same can be said of chunks of public information, which are somehow bought by the private sector and are thus only available (if at all) at a cost.

Public sector information (PSI)

For the sake of argument too, I will make brief use of the so-called  DIKW pyramid (we can for sure drop the Wisdom component!) to make a logical distinction between data and information. In this context, we can say that information is data that is processed and structured in some fashion that is useful for people to understand. Taking this at face value, we immediately realize that governments need to have information (not just data) to be able to function effectively, issue policies and regulations and manage the overall state of affairs of a country.

We also need to point out that not all public sector information comes from data collected by the government or is based on actual data, for that matter. For example, OECD’s definition of PSI includes products and services in addition to data (see http://www.oecd.org/internet/ieconomy/40826024.pdf). The EU uses this definition and calls for the reuse of PSI both within and across governments in the Union.

Based on the above, we can say that public data is a subset of PSI. And open data can at best be large public data if we assume that all data created and collected by public institutions can in fact be opened. This might not be the case if national security considerations are factored in, for example. Bear in mind that private data purchased by the public sector remains private—unless there are licensing agreements that allow for its public, free-royalty sharing and dissemination.

Open data

From the above, we can say that open data is a subset of PSI. We have already noted that governments also process data to create information and such information should also be open. So perhaps we can enrich the concept of open data by including public information. Or we can suggest the term open information, which includes both public data and public information. This distinction is important if we are thinking, for example, about prioritizing which public information sets should be made available first.

If the public sector is already investing public resources to process data and generate information, then it makes perfect sense to make this information also available, in addition to the data that was used to create. If private contractors are used to processes the data and contracts are not clear about the ownership of the information being generated, it is then feasible that such information can become private and will not be readily available to the average person.

Freedom of Information Acts (FOIAs) and legislation actually targeted public information, and usually ignored data. Today, close to 90 countries have either passed FOIAs or are in the process of doing so. Most, if not all, FOIAs exclude private information from their purview, so if public information is being privatized, there is no way to address this issue via FOIAs.

Finally, many countries are updating FOIAs to specifically include both digital information and data—although I cannot imagine the average citizen putting in a FOIA request to access data per se. The addition of digital information is critical, as some governments can argue that FOIAs only apply to information on paper and not on a computer. So once the information is digitized and the paper original is “misplaced” there is no way to openly access it.

Data,  development and revolution

The lack of reliable and official statistics for measuring MDG targets is the living proof that many developing countries do not yet have the resources, capacity and/or political will to generate data and information relevant to national and international development agendas. MAF reports conducted in the past 3 years have provided additional data on the MDGs, but after a quick review of many of them, it was clear that the data provided was not entirely reliable, and in some cases was based on estimates.

In any event, there seems to be a clear need for a “data revolution” as suggested by the post-2015 HLP. But what the HLP has in mind goes well beyond measuring and monitoring development progress. The panel also includes in its call for a “new data revolution” (if it is new, I presume there was a previous data revolution somewhere in the past) for accountability, decision-making processes, capturing citizens’ demands, reaching the neediest, assessing public service delivery, providing open access and supporting statistical systems (see HLP report  http://www.un.org/sg/management/pdf/HLP_P2015_Report.pdf,  pgs. 23-24). Note that the HLP does not refer to big data at all and uses “open data” as two separate words – and not as the concept we discussed in the previous section. Finally,  the HLP also calls for a “Global Partnership on Development Data” which should include all stakeholders and sectors, as well as all interested parties. 

Going back to the issues raised in the previous section, it is possible to argue that what we are really talking about is a new information revolution. The difference with the one that took place in the 1990s is that today many more people, millions if not billions, have access to information and communication channels and can thus not only access information but also provide information interactively and in real time. After all, this is the central difference between the new technologies and, say, radio or TV.

I am not too sure I understand the concept of development data. For poor countries  (LDCs, LICs and many LMICs) most PSI is relevant for development  and is indeed a requirement to make integrated and evidence based policy decisions. This is certainly not the case with countries in the upper brackets of development or income. In this light, there is no single definition of development data as it can vary across different contexts. It is probably better to stick to PSI, which is relevant to development, while also fostering participation, transparency and accountability.

Cheers, Raúl

 

wpChatIcon
wpChatIcon