Extractor, allows the user to extract RDF triples from a SPARQL endpoint using a series of CONSTRUCT queries. It is especially suitable for querying multiple SPARQL endpoints.
- Number of threads to use
- Number of threads to be used for querying in total.
- Query time limit in seconds (-1 for no limit)
- Some SPARQL endpoints may hang on a query for a long time. Sometimes it is therefore desirable to limit the time waiting for an answer so that the whole pipeline execution is not stuck.
- Encode invalid IRIs
- Some SPARQL endpoints such as DBpedia contain invalid IRIs which are sent in results of SPARQL queries. Some libraries like RDF4J then can crash on those IRIs. If this is the case, choose this option to encode such invalid IRIs.
- Fix missing language tags on
- Some SPARQL endpoints such as DBpedia contain
rdf:langStringliterals without language tags, which is invalid RDF 1.1. Some libraries like RDF4J then can crash on those literals. If this is the case, choose this option to fix this problem by replacing
xsd:stringdatatype on such literals.
- Limit number of tasks running in parallel in a group (0 for no limit)
- To avoid overloading a single SPARQL endpoint, set a maximum number of tasks running in parallel for a group. Then, place all tasks targeting a single endpoint to a single group. Tasks in different groups will still run in parallel in the number of threads specified.
- Query commit size (0 to commit all triples at once)
- This is used to control the number of triples committed to a repository at once. Limiting the number may be necessary for large query results.
- RDF single graph - Configuration
- RDF single graph - Tasks
- RDF single graph - Report
- RDF single graph - Output
- Look in pipeline
The SPARQL Endpoint list to single graph component queries a list of remote SPARQL endpoints using SPARQL CONSTRUCT queries. The typical scenarios include discovery tasks such as determining which classes are used in which endpoints, etc. On the input, the component expects a list of tasks specifying endpoints and queries. The Output contains the collected results in a single RDF graph. The Report output contains potential error messages encountered when querying the SPARQL endpoints.
Below you can see sample task specification for the component. This task queries for a list of classes and number of their instances used in the endpoint.