Converting tabular data to RDF: Describe semantics
-
open_withSample RDF input
At this point we have reasonably clean RDF data.
However, the data is described in terms of custom properties, such as :POP
or :NUTS3_10
.
These properties do not give away much of their meaning, making it difficult to comprehend the data.
In order to improve the usability of the data, we therefore map these custom properties to terms from standard RDF vocabularies.
The dataset is actually made up of two parts.
The first part is a code list of the local administrative units.
Code lists, as well as other controlled vocabularies, can be described using the terms of the Simple Knowledge Organization System (SKOS).
The second part consists of statistical measures about the administrative units.
Specifically, there are population counts and areas in square meters.
Statistical data of this kind can be described by the Data Cube Vocabulary (QB).
Both SKOS and QB are widely used standards by the World Wide Web Consortium (W3C).
In the following, we will refer to the terms of these standards via compact IRIs that use short prefixes in place of namespace IRIs.
You can look up the namespace IRIs of common prefixes, such as skos:
and qb:
, by using Prefix.cc.
We cast the local administrative units as instances of skos:Concept
belonging to the LAU concept scheme.
Using an inline table via the VALUES
clause we map the source properties to the target properties from SKOS.
:NAME_1
is mapped to skos:prefLabel
, :LAU2_NAT_CODE
to skos:notation
, and the transliterated names are treated as values of skos:altLabel
.
PREFIX : <http://example.com/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
DELETE {
?lau2 ?source ?o .
}
INSERT {
?lau2 a skos:Concept ;
skos:inScheme <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016> ;
?target ?o .
}
WHERE {
VALUES (?source ?target) {
(:NAME_1 skos:prefLabel)
(:NAME_2_LAT skos:altLabel)
(:LAU2_NAT_CODE skos:notation)
}
?lau2 ?source ?o .
}
-
open_withSample RDF output
We will get back to mapping the statistical part once we generate IRIs of the RDF resources in our data in the next step.