LinkedPipes ETL

ETL: Extract Transform Load for Linked Data

			ondemand_videoDemo
		
			file_downloadInstall
		
			play_for_workGitHub

			descriptionRead
		
			schoolLearn
		
			slideshowSlides

Demo

What's new

2022-09-04: LinkedPipes ETL to be used in the Slovak National Open Data Catalog

LinkedPipes ETL will be used as a core technology in the updated version of the Slovak National Open Data Catalog, inspired by how it is already in use in the Czech National Open Data Catalog.
2020-10-30: LinkedPipes ETL to be used in EU CEF Telecom project STIRData

LinkedPipes ETL will be used as a core technology in the just started EU CEF Telecom project STIRData to promote open data interoperability through Linked Data and standardization!
2019-11-15: New tutorial about loading to Wikibase available!

LinkedPipes ETL now allows you to load data into Wikibase instances such as Wikidata. We prepared a tutorial, which will help you with the process.
2019-07-12: LinkedPipes ETL featured @ Wikimania 2019!

LinkedPipes ETL will be featured as a solution for repeatable loading of data into Wikibases and Wikidata at Wikimania 2019! See you in August in Stockholm, Sweden!
2018-07-04: LinkedPipes ETL featured @ ISWC 2018 Demo Session!

LinkedPipes ETL will be featured as part of the LinkedPipes DCAT-AP Viewer demo at ISWC 2018! See you in October in Monterey, California, USA!

More Tips & Tricks

Featured component

Load to Wikibase

Loads RDF data to Wikibase.

Load to Wikibase

Loader, allows the user to load RDF data using the Wikibase RDF Dump Format to a Wikibase instance. There is a whole tutorial on Loading data to Wikibase available.

Wikibase API Endpoint URL (api.php): This is the URL of the api.php in the target Wikibase instance. For example https://www.wikidata.org/w/api.php
Wikibase Query Service SPARQL Endpoint URL: This is the URL of the SPARQL Endpoint of the Wikibase Query Service containing data from the target Wikibase instance. For example https://query.wikidata.org/sparql
Any existing property from the target Wikibase instance: e.g. P1921 for Wikidata. This is necessary, so that the used library is able to determine data types of properties used in the Wikibase.
Wikibase ontology IRI prefix: This is the common part of the IRI prefixes used by the target Wikibase instance. For example http://www.wikidata.org/. This can be determined by looking at an RDF representation of the Q1 entity in Wikidata, specifically the wd: and similar prefixes.
User name: User name for the Wikibase instance.
Password: Password for the Wikibase instance.
Average time per edit in ms: This is used by the wrapped Wikidata Toolkit to pace the API calls.
Use strict matching (string-based) for value matching: Wikibase distinguishes among various string representations of the same number, e.g. 1 and 01, whereas in RDF, all those representations are considered equivalent and interchangable. When enabled, the textual representations are considered different, which may lead to unnecessary duplicates.
Skip on error: When a Wikibase API call fails and this is enabled, the component continues its execution. Errors are logged in the Report output.
Retry count: Number of retries in case of IOException, i.e. not for all errors.
Retry pause: Time between individual requests in case of failure.
Create item message: MediaWiki edit summary message when creating items.
Update item message: MediaWiki edit summary message when updating items.

More components

3-minute screencast

Features

Modular design

Deploy only those components that you actually need. For example, on your pipeline development machine, you need the whole stack, but on your data processing server, you only need the backend part. The data processing options are extensible by components. We have a library of the most basic ones. When you need something special, just copy and modify an existing one.

All functionality covered by REST APIs

Our frontend uses the same APIs which is available to everyone. This means that you can build your own frontend, integrate only parts of our app and control everything easily.

Almost everything is RDF

Except for our configuration file, everything is in RDF. This includes the ETL pipelines, component configurations and messages indicating the progress of the pipeline. You can generate the pipelines and configurations using SPARQL from your own app. Also, batch modification of configurations is a simple text file operation, no more clicking through every pipeline when migrating.

Commercial support

Commercial support is available through CUIT s.r.o., a spin-off company of the Charles University.