Converting tabular data to RDF
In this tutorial, we will walk through the common steps for converting tabular data, such as CSV files or Excel spreadsheets, to RDF using LinkedPipes ETL (LP-ETL). These steps automate the typical tasks you may encounter in transformations of tabular data to RDF. Each step and the corresponding part of the tutorial is focused on performing a single task, so that there is a clear separation of concerns in the pipeline. We show these steps on the transformation of an example dataset. We provide you with the LP-ETL pipeline for this transformation, so that you can follow along and see the transformation in action. Each LP-ETL component we use to achieve our goal is linked to its documentation, which you may consult for more details about the component's configuration.
The example dataset we use are the Local Administrative Units (LAU), a code list of administrative divisions of the EU member states. LAU correspond to lower-level administrative units subdividing regions of the member states into counties, districts or municipalities. The dataset is available in Excel spreadsheets published by Eurostat, the statistical agency of the European Union. Since the definition of local administrative units may change over time, these spreadsheets are updated each year. In this tutorial, we use the snapshot for the year 2016. This spreadsheet contains LAU for the 28 EU member states, each on a separate sheet.
The tutorial covers the following steps of converting tabular data to linked data:
- Convert Excel spreadsheet to CSV
- Convert CSV to RDF
- Clean data
- Make versions explicit
- Describe semantics
- Generate IRIs
- Link data
- Optimize
- Load data
- Add metadata
It is a tutorial intended for readers who are reasonably well versed in semantic web technologies. However, no prior knowledge of LP-ETL is required to be able to follow this tutorial.