A how-to guide on how to make your data publicly available

Aiming for a european digital single market, the EU is putting a lot of initiatives in motion, empowering businesses and EU citizens to better connect and profit from digital innovation. Part of these actions is the liberation of government data, making it publicly available for everyone to reuse. Opening up government data has many benefits, such as allowing for the creation of new businesses or simply promoting transparency. In order to help guide this process, the EU has published an OpenData Handbook.

 

What’s in it for whom?

This handbook originally aims at EU organisations, but can also be used as a general source on how to best find and prepare data inside your organisation to make them publicly available. In this article, we give you a quick overview of the handbook and a summary of the key points.

With a slim 62 pages, the handbook describes the way for organisations on how to bring their data into the EU Open Data Portal and make it publicly available. The first chapter briefly describes what opendata is and why it is such an important topic (not just) for the EU. From chapter two on it gets more practical, focusing on a six step workflow to liberate your data. The six steps are, what makes this book so interesting for everyone involved in opendata.

 

Six steps to open up your data

OpenData-Cycle

Step 1: You need to identify possible data for publication

The key question here is, what kind of data should you be looking for, what data is actually worth publishing? The authors describe it as any piece of content in any medium, that is stored electronically and made accessible. By content, they mean “any objective, factual, non-personal and non-aggregated information” that has been collected or produced during daily activities [of an institution]”. Some examples of that are statistics and time series, geographic or geospatial data or sets of data about projects, programmes or actions … and there are many more. The book then gives further suggestions on what data could be interesting and where it is to be found.

 

Step 2: You need to analyse the identified data

Once you’ve found possible data sources, it is all about analysing your data regarding openness in two ways. There is first a legal openness which refers to the usage of an open licence allowing the reuse of the data. Then there is the technical openness refers to the publication of the data in a machine readable and non-proprietary format. You can determine both aspects, by answering the following questions:

  • Who owns the rights?
    If you’re not the holder of the copyright of the data, you might have to contact a third party vender.
  • Does it contain personal data?
    Personal data has to be fully protected, according to EU regulations. This limits the reuse of certain data.
  • What is the quality of the data?
    When looking at quality aspects, you are looking at aspects like accuracy, completeness, consistency or correct semantics and syntax of the data.
  • In what format is the data available?
    Regarding the format there are different aspects to take into account. One usually differentiates between closed and open formats. A good guideline cited in the book is Tim Berners-Lee’s five-star scheme which is upheld by the W3C consortium. The different levels refer to (1) the usage of an open licence, plus (2) a machine readable format, plus (3) an open format, plus (4) a uniform resource identifier, plus (5) the usage of linked data. You can find further explanations on the different levels as well as how to get there in the book. But as a basic orientation, aiming for a three-star ranking is a good start.

Five-Star-Scheme

Step 3: You need to create the Metadata

In order to make your data really open and usable for people, you need to add the metadata. Consider it a sort of description of what’s to be found in your data. It makes the discovery of the data a lot easier, as it is the key stone for search engines and identification of datasets.

The basic metadata should include the author of the data, the title and the year of publication. From there on, you can of course be a lot more specific. Standardisation and quality of metadata again, make searching for your data a lot easier, as search engines know where to look at how to interpret the data a lot better. Metadata also contains the legal notice, the information of how the data is allowed to be reused.

 

Step 4: You need to publish both data and metadata

This is it, you can now make your data publicly available. A key question here is certainly where to publish. For EU institutions there is the aforementioned EU Data Portal, serving as a single point of access for EU open data. Of course if you’re not connected to the EU, the first place to publish your data, is on your own website. In the book it is recommended to open up a specific section for this, again to make it easier for people to find your data. Of course there is also dedicated portals on the web, crawling for opendata, serving as single point of access as well. YourDataStories is working on something similar, allowing users to browse through large amounts of opendata coming from different data providers in one single spot.

 

Turning opendata into a story

Following the publication the book lists two more steps, which are the curation of the data (5) and entering new datasets (6). These steps are completing the life-cycle of opendata, including updates and evaluations on your data. This allows users to always have access to the most current versions and you as a data provider to see what people are interested in.

The last pages of the book then cover general terms and processes from the opendata world and are quite useful, if you are new to the field. We recommend having a look at it should you consider publishing some of your data. We definitely recommend it, also because we believe that making the world more transparent through opendata can help both sides in finding and telling new stories.

 

About the author : Tilman Wagner (DW)