Converting tabular data to RDF: Make versions explicit

  • open_withSample RDF input
    @prefix :        <http://example.com/> .
    @prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .
    
    [ :AREA "1467502"^^xsd:decimal ;
      :CHANGE false ;
      :LAU1_NAT_CODE "00EB" ;
      :LAU1_NAT_CODE_NEW "E06000001" ;
      :LAU2_NAT_CODE "00EBNM"
      :LAU2_NAT_CODE_NEW "E05008942" ;
      :NAME_1 "Burn Valley"@en ;
      :NUTS3_10 "UKC11" ;
      :NUTS3_13 "UKC11" ;
      :POP 8774 ;
      :sheet_name "UK" ] .

Our dataset contains rudimentary data on versioning. Previous codes are provided for several administrative areas. These codes are made obsolete by the current codes, but they are useful to have for mapping older data that refers to them. We replace the old codes with the new ones and mark them as replaced via the dcterms:replaces property.

PREFIX :        <http://example.com/>
PREFIX dcterms: <http://purl.org/dc/terms/>

DELETE {
  ?lau :LAU2_NAT_CODE ?lau2old ;
    :LAU1_NAT_CODE ?lau1old .
}
INSERT {
  ?lau :LAU2_NAT_CODE ?lau2new ;
    :LAU1_NAT_CODE ?lau1new ;
    dcterms:replaces [
      :LAU2_NAT_CODE ?lau2old ;
      :LAU1_NAT_CODE ?lau1old
    ] .
}
WHERE {
  ?lau :LAU2_NAT_CODE_NEW ?lau2new ;
    :LAU1_NAT_CODE_NEW ?lau1new ;
    :LAU2_NAT_CODE ?lau2old ;
    :LAU1_NAT_CODE ?lau1old .
}

The dataset also describes whether the definitions of administrative areas changed since the previous year's version of the code list. This is indicated by the :CHANGE boolean property. In order to make the semantics of this property more explicit, we indicate the change via a date of modification.

PREFIX :        <http://example.com/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>

DELETE {
  ?lau2 :CHANGE true .
}
INSERT {
  ?lau2 dcterms:modified "2016-01-01"^^xsd:date .
}
WHERE {
  ?lau2 :CHANGE true .
}
  • open_withSample RDF output
    @prefix :        <http://example.com/> .
    @prefix dcterms: <http://purl.org/dc/terms/> .
    @prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .
    
    [ :AREA "1467502"^^xsd:decimal ;
      :CHANGE false ;
      :LAU1_NAT_CODE "E06000001" ;
      :LAU2_NAT_CODE "E05008942" ;
      :NAME_1 "Burn Valley"@en ;
      :NUTS3_10 "UKC11" ;
      :NUTS3_13 "UKC11" ;
      :POP 8774 ;
      :sheet_name "UK" ;
      dcterms:replaces [
        :LAU1_NAT_CODE "00EB" ;
        :LAU2_NAT_CODE "00EBNM"
      ] ] .

The pipeline that includes the step of cleaning data is available here.