Converting tabular data to RDF: Split statistics and the LAU code list

  • open_withSample RDF input
    @prefix :     <http://example.com/> .
    @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
    @prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
    
    <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016/fi/005> a skos:Concept ;
      skos:prefLabel "Alajärvi"@fi ;
      skos:notation "005" ;
      skos:broaderTransitive <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016/fi/146> ;
      skos:inScheme <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016> ;
      :AREA "1056740000"^^xsd:decimal ;
      :CHANGE false ;
      :POP 10006 ;
      :sheet_name "FI" .

At this stage we split the data into the two parts mentioned above. We fork the data processing pipeline into two branches. One branch creates the statistical part of the dataset and the other branch creates the LAU code list. We start with the statistical part, which forms a three-dimensional data cube. The dimensions of our data cube consist of LAU2 and year, plus the measure dimension that distinguishes between the two measures in the data: area and population. Measure dimension is one of the ways in which QB can handle multi-measure data cubes. Additionally, we add an optional property indicating the degree of urbanisation and an attribute that marks the missing observations in the data cube. This is the data structure definition (DSD) of the data cube:

@prefix :               <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/statistics/> .
@prefix dbo:            <http://dbpedia.org/ontology/> .
@prefix dimension:      <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/statistics/dimension/> .
@prefix geo:            <http://ec.europa.eu/eurostat/ramon/ontologies/geographic.rdf#> .
@prefix obeu:           <http://data.openbudgets.eu/ontology/> .
@prefix property:       <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/statistics/property/> .
@prefix qb:             <http://purl.org/linked-data/cube#> .
@prefix rdf:            <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:           <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sdmx-attribute: <http://purl.org/linked-data/sdmx/2009/attribute#> .
@prefix sdmx-dimension: <http://purl.org/linked-data/sdmx/2009/dimension#> .
@prefix unit:           <http://qudt.org/vocab/unit#> .

:dsd a qb:DataStructureDefinition ;
  rdfs:label "Data structure definition of the LAU statistics."@en ;
  qb:component [ qb:dimension dimension:refArea ],
               [ qb:dimension sdmx-dimension:refPeriod ],
               [ qb:dimension qb:measureType ],
               [ qb:componentProperty property:degreeOfUrbanisation ;
                 qb:componentRequired false ],
               [ qb:attribute sdmx-attribute:obsStatus ;
                 qb:componentRequired false ],
               [ qb:measure dbo:area ],
               [ qb:measure dbo:populationTotal ] .

dimension:refArea a rdf:Property, qb:CodedProperty, qb:DimensionProperty ;
  rdfs:label "Reference area"@en ;
  rdfs:subPropertyOf sdmx-dimension:refArea ;
  rdfs:range geo:LAURegion ;
  qb:codeList <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016> .

property:degreeOfUrbanisation a rdf:Property, qb:CodedProperty, obeu:OptionalProperty ;
  rdfs:label "Degree of urbanisation"@en ;
  qb:codeList <http://eurostat.linked-statistics.org/dic/deg_urb> .

dbo:area sdmx-attribute:unitMeasure unit:SquareMeter .

We map the data to QB by a SPARQL CONSTRUCT query. LP-ETL offers the SPARQL CONSTRUCT component to run such queries. Similarly to SPARQL Update operations, you can validate SPARQL queries in the SPARQLer Query Validator before you enter them to LP-ETL. This is the query that maps the data to its DSD:

PREFIX :               <http://example.com/>
PREFIX dbo:            <http://dbpedia.org/ontology/>
PREFIX dimension:      <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/statistics/dimension/>
PREFIX property:       <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/statistics/property/>
PREFIX qb:             <http://purl.org/linked-data/cube#>
PREFIX sdmx-code:      <http://purl.org/linked-data/sdmx/2009/code#>
PREFIX sdmx-dimension: <http://purl.org/linked-data/sdmx/2009/dimension#>
PREFIX skos:           <http://www.w3.org/2004/02/skos/core#>
PREFIX year:           <http://reference.data.gov.uk/id/gregorian-year/>

CONSTRUCT {
  ?observation a qb:Observation ;
    ?target ?measure ;
    qb:measureType ?target ;
    sdmx-dimension:refPeriod ?year ;
    dimension:refArea ?lau2 ;
    property:degreeOfUrbanisation ?degreeOfUrbanisation ;
    qb:dataSet <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau2/statistics> .
}
WHERE {
  VALUES (?source                       ?target             ?year) {
         (:AREA                         dbo:area            year:2016)
         (:POP                          dbo:populationTotal year:2016)
         # Due to the plus sign in the IRI we cannot use a compact IRI.
         (<http://example.com/POP+2015> dbo:populationTotal year:2015)
  }

  ?lau2 ?source ?measure ;
    skos:notation ?lau2Notation ;
    :sheet_name ?country .

  BIND ("https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/observation/" AS ?ns)
  BIND (substr(str(?year), strlen(str(year:)) + 1) AS ?yearNotation)
  BIND (substr(str(?target), strlen(str(dbo:)) + 1) AS ?measurePropertyNotation)
  BIND (iri(concat(?ns,
                   lcase(?country),
                   "/",
                   encode_for_uri(lcase(?lau2Notation)),
                   "/",
                   ?yearNotation,
                   "/",
                   ?measurePropertyNotation)) AS ?observation)

  OPTIONAL {
    ?lau2 :DEGURBA ?degreeOfUrbanisation .
  }
}

We update the output of this query to add the above-mentioned data structure definition and the description of the QB dataset.

PREFIX :               <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/statistics/>
PREFIX dbo:            <http://dbpedia.org/ontology/>
PREFIX dimension:      <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/statistics/dimension/>
PREFIX geo:            <http://ec.europa.eu/eurostat/ramon/ontologies/geographic.rdf#>
PREFIX obeu:           <http://data.openbudgets.eu/ontology/>
PREFIX property:       <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/statistics/property/>
PREFIX qb:             <http://purl.org/linked-data/cube#>
PREFIX rdf:            <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:           <http://www.w3.org/2000/01/rdf-schema#>
PREFIX sdmx-attribute: <http://purl.org/linked-data/sdmx/2009/attribute#>
PREFIX sdmx-dimension: <http://purl.org/linked-data/sdmx/2009/dimension#>
PREFIX skos:           <http://www.w3.org/2004/02/skos/core#>
PREFIX unit:           <http://qudt.org/vocab/unit#>

INSERT DATA {
  :dsd a qb:DataStructureDefinition ;
    rdfs:label "Data structure definition of the LAU statistics."@en ;
    qb:component [ qb:dimension dimension:refArea ],
                 [ qb:dimension sdmx-dimension:refPeriod ],
                 [ qb:dimension qb:measureType ],
                 [ qb:componentProperty property:degreeOfUrbanisation ;
                   qb:componentRequired false ],
                 [ qb:attribute sdmx-attribute:obsStatus ;
                   qb:componentRequired false ],
                 [ qb:measure dbo:area ],
                 [ qb:measure dbo:populationTotal ] .

  dimension:refArea a rdf:Property, qb:CodedProperty, qb:DimensionProperty ;
    rdfs:label "Reference area"@en ;
    rdfs:subPropertyOf sdmx-dimension:refArea ;
    rdfs:range geo:LAURegion ;
    qb:codeList <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016> .

  property:degreeOfUrbanisation a rdf:Property, qb:CodedProperty, obeu:OptionalProperty ;
    rdfs:label "Degree of urbanisation"@en ;
    qb:codeList <http://eurostat.linked-statistics.org/dic/deg_urb> .

  dbo:area sdmx-attribute:unitMeasure unit:SquareMeter .

  <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau2/statistics> a qb:DataSet ;
    rdfs:label "LAU2 statistics"@en ;
    qb:structure :dsd .
}

The DSD can be alternatively provided in an RDF serialization to the Text holder component, then parsed as RDF via Files to RDF, merged via Graph merger, and combined with the QB data via Union.

The sheet for Irish LAU also features the column DEGURBA that lists the degrees of urbanisation of the administrative units. The numeric codes denoting the degrees can be converted to links to the code list of degrees of urbanisation (in RDF/XML), which is available as part of the data from Eurostat converted to linked data. We link the coded values of degree of urbanisation to the values from this code list.

PREFIX deg-urb:  <http://eurostat.linked-statistics.org/dic/deg_urb#>
PREFIX property: <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/statistics/property/>

DELETE {
  ?lau2 property:degreeOfUrbanisation ?code .
}
INSERT {
  ?lau2 property:degreeOfUrbanisation ?degUrb .
}
WHERE {
  VALUES (?code ?degUrb) {
         ("1"   deg-urb:DEG1)
         ("2"   deg-urb:DEG2)
         ("3"   deg-urb:DEG3)
  }
  ?lau2 property:degreeOfUrbanisation ?code .
}

Finally, we add placeholders for the missing measures in the dataset. QB requires all measures to be present, however, our dataset lacks some measures. In order to comply with the requirements on well-formed data cubes, we generate zero placeholders for the absent measures, for which we mark their observation status (sdmx-attribute:obsStatus) as missing (sdmx-code:obsStatus-M).

PREFIX dbo:            <http://dbpedia.org/ontology/>
PREFIX dimension:      <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/statistics/dimension/>
PREFIX qb:             <http://purl.org/linked-data/cube#>
PREFIX sdmx-attribute: <http://purl.org/linked-data/sdmx/2009/attribute#>
PREFIX sdmx-code:      <http://purl.org/linked-data/sdmx/2009/code#>
PREFIX sdmx-dimension: <http://purl.org/linked-data/sdmx/2009/dimension#>
PREFIX year:           <http://reference.data.gov.uk/id/gregorian-year/>

INSERT {
  ?observation a qb:Observation ;
    dimension:refArea ?area ;
    sdmx-dimension:refPeriod ?year ;
    qb:measureType ?otherMeasureProperty ;
    ?otherMeasureProperty 0 ;
    sdmx-attribute:obsStatus sdmx-code:obsStatus-M ;
    qb:dataSet <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau2/statistics> .
}
WHERE {
  VALUES (?measureProperty    ?otherMeasureProperty) {
         (dbo:area            dbo:populationTotal)
         (dbo:populationTotal dbo:area)
  }
  [] dimension:refArea ?area ;
    sdmx-dimension:refPeriod ?year ;
    qb:measureType ?measureProperty ;
    ?measureProperty [] .
  FILTER NOT EXISTS {
    [] dimension:refArea ?area ;
      sdmx-dimension:refPeriod ?year ;
      qb:measureType ?otherMeasureProperty ;
      ?otherMeasureProperty [] .
  }
  
  BIND ("https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/observation/" AS ?ns)
  BIND (replace(str(?area), "^.+/([^/]+)$", "$1") AS ?areaNotation)
  BIND (substr(str(?year), strlen(str(year:)) + 1) AS ?yearNotation)
  BIND (substr(str(?otherMeasureProperty), strlen(str(dbo:)) + 1) AS ?measurePropertyNotation)
  BIND (iri(concat(?ns, ?areaNotation, "/", ?yearNotation, "/", ?measurePropertyNotation)) AS ?observation)
}
  • open_withSample RDF output
    @prefix dbo:            <http://dbpedia.org/ontology/> .
    @prefix dimension:      <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/statistics/dimension/> .
    @prefix qb:             <http://purl.org/linked-data/cube#> .
    @prefix sdmx-dimension: <http://purl.org/linked-data/sdmx/2009/dimension#> .
    @prefix year:           <http://reference.data.gov.uk/id/gregorian-year/> .
    @prefix xsd:            <http://www.w3.org/2001/XMLSchema#> .
    
    <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/observation/fi/005/2016/populationTotal> a qb:Observation ;
      qb:measureType dbo:populationTotal ;
      sdmx-dimension:refPeriod year:2016 ;
      dimension:refArea <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016/fi/005> ;
      dbo:populationTotal 10006 ;
      qb:dataSet <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau2/statistics> .
    
    <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/observation/fi/005/2016/area> a qb:Observation ;
      qb:measureType dbo:area ;
      sdmx-dimension:refPeriod year:2016 ;
      dimension:refArea <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016/fi/005> ;
      dbo:area "1056740000"^^xsd:decimal ;
      qb:dataSet <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau2/statistics> .

At the point where we fork our pipeline with the SPARQL CONSTRUCT query producing the statistical part of the dataset, we continue with processing of the LAU code list. First, we remove the original data that was already mapped. This update operation also deletes the source data that we turned into the statistical part.

PREFIX : <http://example.com/>

DELETE {
  ?s ?p ?o .
}
WHERE {
  VALUES ?p {
    :AREA
    :CHANGE
    :POP
    :sheet_name
    <http://example.com/POP+2015>
  }
  ?s ?p ?o .
}

Next, we insert the concept scheme of the LAU code list.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

INSERT DATA {
  <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016> a skos:ConceptScheme ;
    rdfs:label "Local administrative units 2016"@en .
}
  • open_withSample RDF output
    @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    
    <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016> a skos:ConceptScheme ;
      rdfs:label "Local administrative units 2016"@en .
    
    <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016/fi/005> a skos:Concept ;
      skos:inScheme <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016> ;
      skos:prefLabel "Alajärvi"@fi ;
      skos:notation "005" ;
      skos:broaderTransitive <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016/fi/146> .
    
    <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016/fi/050> a skos:Concept ;
      skos:inScheme <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016> ;
      skos:prefLabel "Eura"@fi ;
      skos:notation "050" ;
      skos:broaderTransitive <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016/fi/041> .
    
    <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016/fi/592> a skos:Concept ;
      skos:inScheme <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016> ;
      skos:prefLabel "Petäjävesi"@fi ;
      skos:notation "592" ;
      skos:broaderTransitive <https://linked.opendata.cz/resource/ec.europa.eu/eurostat/lau/2016/fi/131> .