How to convert RDF to CSV
You want to use your RDF data in non-RDF tools.
Problem
Even though RDF was designed to be the lingua franca for data on the Web, this role practically belongs to CSV. While few tools support input in RDF, most every tool supports input in CSV. CSV is thus often the common denominator when you need to exchange data between applications. For example, CSV can feed visualizations and its propositional form is suitable for most data mining software. In order to be able to leverage these tools you need to convert your RDF data into CSV.
Solution
RDF can be transformed to CSV via SPARQL SELECT queries. The SELECT query form produces a tabular view of the queried RDF data. Its results can be serialized to CSV, as specified in the W3C recommendation on SPARQL 1.1 Query Results CSV and TSV Formats.
LinkedPipes ETL (LP-ETL) provides the SPARQL select component to extract tabular data out of RDF. The resulting table is stored as a CSV file. For instance, consider you want to reverse the transformation described in the tutorial on converting CSV to RDF. You would start with the RDF result of that transformation (the file://
protocol IRIs corrected):
The following SELECT query reverts the above RDF data back to its original CSV form:
BASE <file:///media-types.csv#>
SELECT ?MediaType ?Type ?Subtype ?Template ?Extensions
WHERE {
?s <Media+Type> ?MediaType ;
<Subtype> ?Subtype ;
<Template> ?Template ;
<Type> ?Type .
OPTIONAL {
?s <Extensions> ?Extensions .
}
}
ORDER BY ?MediaType
This query results in a table that, apart from a slight change in the header, matches the original CSV data. The columns have the names of the variables projected by the query without the leading ?
(or $
).
MediaType | Type | Subtype | Template | Extensions |
---|---|---|---|---|
text/css | text | css | http://www.iana.org/assignments/media-types/text/css | css |
text/csv | text | csv | http://www.iana.org/assignments/media-types/text/csv | csv |
text/csv-schema | text | csv-schema | http://www.iana.org/assignments/media-types/text/csv-schema |
You can find the pipeline that converts the example data from RDF to CSV here.
Discussion
As mentioned, CSV is a common denominator supported by most tools. It is also perhaps the lowest common denominator, since it offers low data fidelity. Since each value is serialized to CSV as a string, distinctions between more specific data types, such as numbers or IRIs, are lost. For example, the SPARQL results format in CSV uses the empty string to represent both empty literals and unbound variables. Some distinctions can be recovered by matching strings to patterns of more specific data formats, such as HTTP URLs starting with http://
. Nevertheless, CSV presents a lossy encoding.
A key concept to understand when transforming RDF to CSV is cardinality. In the context of RDF, cardinality is a feature of properties that describes the number of objects a property has for a single subject. We recognize four basic categories of cardinalities. For each category there is a way to handle it in SPARQL when converting RDF to a single table.
Cardinality | Notation | Number of objects per property and subject | Handling in SPARQL |
---|---|---|---|
One-to-one | 1..1 |
One | No special handling required. |
Optional one-to-one | 0..1 |
Zero or one | OPTIONAL |
One-to-many | 1..* |
One or more | Aggregates |
Optional one-to-many | 0..* |
Zero or more |
OPTIONAL with aggregates
|
As you can see in the SPARQL query above, all included properties are 1..1
properties, except the <Extensions>
property that is a 0..1
property, which is therefore wrapped in an OPTIONAL
clause. If there are multi-valued properties, you need to group the query results by the variables for the 1..1
and 0..1
properties and aggregate the multiple values into a single value, such as by selecting one randomly via sample()
.
Since cardinalities of RDF properties can be computed automatically, queries transforming RDF to CSV can be generated in an automated fashion as well. You may also generate the transformation query first and then fine-tune it manually. You can use the Mustache component for this task, as described in the tutorial on generating SPARQL queries from Mustache templates. Column names can be created from local names of RDF properties. Local name is typically the part of an IRI following the last forward slash or hash. Note that, unlike IRIs, local names of RDF properties may be non-unique. In that case, you can append an md5() hash of the IRI to the property's local name to make it unique.
See also
Unlike most RDF data, data described by the Data Cube Vocabulary (DCV) exhibits a remarkably regular structure. DCV constrains the expressivity of RDF, such that multi-valued properties are missing. We leverage this regularity in the DCV to CSV transformation that is implemented as an LP-ETL pipeline fragment using the above-described principles. As such, it presents a practical use case for converting RDF to CSV.