Data Revolution, Open Data and Public Sector Information

I have been planning on writing a short blog on the HLP report and the role of ICTs in the post-2015 agenda. As I see it, the report has missed a big opportunity here (as the old MDGs did!) by focusing only in one aspect of the potential of the new technologies in enhancing human development. While in the 2000s the focus was access to ICTs (and thus target 18), now we have shifted to data, as in digital data.

While thinking about the arguments I realized that further research and thinking was needed. First of all, I needed to fully understand what we mean by “data” and “data revolution” and then link his to the context of developing countries. This led me to think on the broader concept of public sector information – not just data.

So here are some thoughts on all of this.

Data

Simplifying things a bit for the sake of argument, we can define three distinct types of data based on the way they are produced. The private sector has always produced massive amounts of data for many reasons, one key goal being the requirement to properly read the markets to dynamically move supply to cater to specific demands from consumers. Being that as it may, data produced is this fashion is private and accessing usually is not open as data must be purchased -if available for sale. In this sense, data here is a commodity just like any other.

Governments also produce quite a bit of data. Think for example of civil and electoral registries, health and education data, etc. But public data is financed by taxes and other public resources and thus is, well, public. It is thus a public good (nor-rival and non-exclusive) and should be available to all stakeholders and citizens.  Public data is thus not a commodity and should in principle be available to all citizens and stakeholders.

Finally we have personal data or all the data (and information) that pertains to the identity of any natural person. This data is sort of part of our personal DNA and is directly related to privacy. The latter is a fundamental human right as acknowledged by the UN Declaration on Human Rights and the International Convenant on Civic and Political Rights.

Now, quickly glancing at the way the Internet and other news technologies have impacted this, we can say that there is a clear tendency to mesh the three different data types into one single and bigger set. Think for example of the so-called big data, combined with the way in which personal data has become much more public, especially with the advent of social networks.  Nowadays, personal data is being privatized in a unprecedented scale and is sold as any other commodity. The same can be said about chunks of public information which is somehow bought by the private sector and is thus only available (if at all available) at a cost.

Public sector information (PSI)

For the sake of argument too, I will make brief use of the so-called  DIKW pyramid (we can for sure drop the Wisdom component!) to make a logical distinction between data an information. In this context, we can say that information is data that is processed and structured in some fashion that is useful for people to understand. Taking this at face value, we immediately realize that governments need to have information (not just data) to be able to function effectively, issue policies and regulations and manage the overall state of affairs of a country.

We also need to point out that not all public sector information comes from data collected by the government or is based on actual data for that matter. For example, OECD’s definition of PSI includes products and services in addition to data (see http://www.oecd.org/internet/ieconomy/40826024.pdf). The EU uses this definition and calls for the reuse of PSI both within and across governments in the Union.

Based on the above, we can the say that public data is a subset of PSI. And open data can at best be as large public data, if we assume that all data created and collected by public institutions can be in fact be opened. This might not be the case if national security considerations are factored in, for example. Bear in mind that private data purchased by the public sector remains private -unless thre are licensing agreements that allow for its public, free-royalty sharing and dissemination.

Open data

From the above, we can now say that open data refers to public data as a subset of PSI. We have already noted that governments also process data to create information and such information should also be open. So perhaps we can enrich the concept of open data by including public information in it. Or we can instead suggest the term open information which will include both public data and public information. This distinction is important if we are thinking for example of prioritizing which public information sets should be first made available.

If the public sector is already investing public resources to process data and generate information then it makes perfect sense to make this information also available, in addition to the data that was used to created. If private contractors are used to processes the data and contracts are not clear about the ownership of the information being generated, it is then feasible that such information can become private and will not be readily available to the average person.

Freedom of information acts (FOIAs) and legislation actually targeted public information -and usually ignored data. Today, close to 90 countries  have either passed FOIAs or are in the process of doing so. Most if not all FOIAs exclude private information from their purview so if public information is being privatized there is not way to address this issue via FOIAs.

Finally, many countries are updating FOIAs to specifically include both digital information and data  -although I cannot imagine the average citizen putting a FOIA request to access data per se.  The addition of digital information is critical as some governments can argue that FOIAs only apply to information on paper and not on a computer. So once the information is digitized and the paper original “misplaced” there is no way to openly access it.

Data,  development and revolution

The lack of reliable and official statistics for measuring MDG targets is the living proof that many developing countries do not yet have the resources, capacity and/or political will to generate data and information relevant to national and international development agendas. MAF reports conducted in the last 3 years have provided additional data on the MDGs but, after a quick review of many of them, it was clear that the data provided was not entirely reliable  -and in some cases was based on estimates.

In any event, there seems to be clear  need for a “data revolution” as suggested by the post-2015 HLP. But what the HLP has in mind goes well beyond measuring and monitoring development progress. The panel also includes in its call for a “new data revolution” (if it is new I presume there was a previous data revolution somewhere in the past) for accountability, decision-making processes, capturing citizens demands, reaching the neediest, assessing public service delivery, providing open access and supporting statistical systems (see HLP report  http://www.un.org/sg/management/pdf/HLP_P2015_Report.pdf,  pgs. 23-24). Note that the HLP does not refer to big data at all and uses “open data” as two separate words – and not as the concept we discussed in the previous section. Finally,  the HLP also calls for a “Global Partnership on Development Data” which should include all stakeholders and sectors, as well as all interested parties. 

Going back to the issues raised  in the previous section, it is possible to argue that what we are really talking about is a new information revolution. The difference with the one that took place in the 1990s is that today many more people, millions if not billions, have access to information and communication channels and can thus not only access information but also provide information interactively and in real time.  After all this is the central difference between the new technologies and say radio or TV, etc.

I am not too sure I understand the concept of development data. For poor countries  (LDCs, LICs and many LMICs) most PSI is relevant for development  and is indeed a requirement to make integrated and evidence based policy decisions. This is certainly not the case with countries in the upper brackets of development or income. In this light, there is no single definition of development data as it can vary across different contexts. It is probably better to stick then to PSI which is relevant to development, while also fostering participation, transparency and accountability.

Cheers, Raúl

 

 

 

 

 

 

Print Friendly, PDF & Email