SPARQL Endpoint chunked
Extractor, allows the user to extract RDF triples from a SPARQL endpoint using a series of CONSTRUCT queries based on an input list of entity IRIs. It is suitable for bigger data, as it queries the endpoint for descriptions of a limited number of entities at once, creating a separete RDF data chunk from each query.
- Endpoint URL
- URL of the SPARQL endpoint to be queried
- MimeType
- Some SPARQL endpoints (e.g. DBpedia) return invalid RDF data with the default MIME type and it is necessary to specify another one here. This is sent in the SPARQL query HTTP request
Accept
header. - Chunk size
- Number of entities to be queried for at once. This is done by using the VALUES clause in the SPARQL query, so this number corresponds to the number of items in the VALUES clause.
- Default graph
- IRIs of the graphs to be passed as default graphs for the SPARQL query
- SPARQL CONSTRUCT query with ${VALUES} placeholder
- Query for extraction of triples from the endpoint. At the end of its WHERE clause, it should have the
${VALUES}
placeholder. This is where the VALUES clause listing the current entities to query is inserted.
Characteristics
- ID
- e-sparqlendpointchunked
- Type
- extractor
- Inputs
- RDF single graph - configuration
- Files - Input
- Outputs
- RDF chunked
- Look in pipeline
- Sample pipeline
- available
The SPARQL Endpoint chunked component queries a remote SPARQL endpoint using a SPARQL CONSTRUCT query.
It can be configured at runtime using RDF configuration, which can be generated by another component.
This chunked version of the component is suitable for bigger data, which needs to be queried by parts.
The typical use case is getting descriptions of a larger number of entities of the same type, which would be too big to get in one query.
On the input, the component expects a single CSV file containing one column with a header.
The column header is the name of the variable, which will be used in the SPARQL query as the IRI of the entity, about which the extracted data should be.
The row then contain the list of IRIs of these entities.
The list is split into pieces determined by the Chunk size parameter and each piece is inserted into the query using a VALUES clause in place of the ${VALUES}
placeholder, forming one RDF data chunk on the output.
The input list of entities can be created either manually, or using the SPARQL endpoint select or SPARQL endpoint select scrollable cursor components.
Runtime configuration
Below you can see sample runtime configuration for the component, which includes specification of default graph IRIs. You can also import a sample pipeline. Note that RDF blank nodes are not allowed in runtime configurations.