We are the DCAT-team. You can also refer to us as the S-team. Why? Well, from left to right we have Sébastien, Sylvain, Stan, Samir and Simon. Our assignment exists out of three projects. Creating a DCAT feed for IPT, a Drupal module and a DCAT validator. Before we go into details let us first explain what DCAT is.
Essentially DCAT (Data Catalog Vocabulary) is a set of terms to describe datasets. This is a rather abstract definition, so let us give an example. Let’s say a researcher needs information about beetles. On the following image you can see three datasets: the beetles in Belgium, the beetles in the Netherlands and one about cat pictures. Normally when the researcher wants to find this information, he needs to search on the web for ‘beetles’ and everything related to it. Then he has to look at the data to see wether it is relevant to his research. If the researcher stumbles upon the dataset about cats, he won’t know that it is irrelevant until he looks at it.
Wouldn’t it be better if a computer could do this? Well, DCAT makes that possible. If all datasets on the internet are standardised in this way, a computer can easily know what the data is about. Now the researcher just has to ask a computer to find every dataset that contains or is related to beetles.
In a perfect world, everyone would be using DCAT, so that all data can be accessed in a uniform way. But making sure so is one hell of a job. Our three project make us grow closer to that goal though. Let’s take a look at our projects and explain them the best we can.
The first project: DCAT Java
INBO (Institute for Nature and Forest Research) wants us to integrate DCAT in the IPT (Integrated Publish Toolkit). IPT is a publishing toolkit for data. We just need to add some features on the already existing code. Our goal is to generate a DCAT feed for the entire application.
Members: Sylvain Delbauve & Simon Van Cauter
DCAT Drupal Module
The main purpose of this project is to create a Drupal module to extract metadata (title, keywords …) from content-type in a website. A content-type is a type of content such as blog post or even a file.
This module will give the opportunity to everyone to have a standardised DCAT description for their data or datasets. It will also make sure this process is automated. When that is ready, it will result in the creation of an RDF file. With this module, companies will be able to publish datasets to their data catalogue in an easy way.
Member: Samir Hanini
Partner: Flemish Gouvernment
The intention of our last project is to make a validator for DCAT feeds. It shows errors or warnings when respectively mandatory or required properties are missing. The validation can easily be done by manually inserting the text, uploading a file or inserting a URL. This project can be combined with the other projects to know if their DCAT feed has the correct syntax.
Members: Stan Callewaert & Sébastien Henau
Partner: The Flemish Government
That’s about it for now. Hopefully now you’ll understand a bit more of what we’re working on here. It’s actually just a real simple explanation of our project, but we’ll keep you posted on the details!