Documentation - LinkedPipes ETL

Overview

Once you have LinkedPipes ETL installed and running, it is time to explore. Of course, there is not much to see in an empty instance, but let us go through the basics. The image below is what we call a Pipeline. It is a defined data transformation process which consists of interconnected components. You can add a component to a pipeline by clicking in an empty space and selecting the desired component from the list, or by dragging and edge to an empty space. Each component has a name, green input ports and yellow output ports. The ports represent Data Units, which can contain either RDF data or regular files, according to their type. Ports are connected by edges, which indicate the flow of data. Only ports representing Data Units of the same type can be connected. In addition, component configuration can be shared among pipelines using component templates. ETL Pipeline

Pipelines

Pipelines, defined data transformation processes consisting of interconnected components, are one of the key concepts in LinkedPipes ETL. So far, you can name them, display them, edit them and run them. Once you run a Pipeline, you create its Execution.

Executions

An Execution is one run of a Pipeline. It can either succeed or fail. On the Executions page, you can see the currently running executions with their progress as well as finished executions. You can see the start and end time of an Execution as and the amount of disk space it takes up. You can free that disk space by using the trash icon. You can also re-run the execution or edit its Pipeline, which is useful for debugging.

Pipeline development workflow

We have imporved the way a pipeline can be designed, which will save you time and frustration with our component list.

Component compatibility

Drag edge

The first component is placed the usual way, just click anywhere on the pipeline and a component list will show up. From the second component on, it gets interesting. Place the next component by dragging and edge from an existing one.

The list of components you see is limited only to those actually compatible with the output of the current component. For HTTP Get, you get only those components that can consume files, for components that produce RDF, only those able to consume RDF are offered.

Component tags

In addition, we tagged the components so that you can find them not only by their name, which sometimes you would have to guess, but also by what they do and by the formats they process. Let's say you want to unzip a file. For that, we have the Decompress archive component. However, it may be hard to find under that name. Now, you can find it by typing zip.

Filter by zip

Recommended by your usage

As you create more and more pipelines, you will notice that the ordering of the offered components adjusts to your typical pipeline shape. This is because we now recommend the components ordered by their probability of appearance, based on the other pipelines in the LP-ETL instance. For example, when you process lots of tabular data, your typical pipeline will begin with an HTTP get followed by the Tabular component. If that is the case, then the Tabular component will be typically listed first, when you drag an edge from the HTTP get component.

Debugging support

When you execute a pipeline, you can watch it execute in the graphical overview. When a pipeline fails, you can simply fix what was wrong and resume from the point of failure. And when you are developing your query and need to rerun it over an over again until it is perfect, you can do that too.

Resume failed pipeline execution

Execution list

When your execution fails, you can see that in the execution list. When you click on the execution, the graphical execution overview opens, where you can see what went wrong. By clicking on a data unit (circle on a component) you can browse its contents. By clicking on a component, you can see details about its execution. Now you need to switch to edit mode, fix the component that failed and click on "execute" to resume the execution of the pipeline.

Graphical execution overview

Component state type

Failed

Failed component

The component with red border is the failed one. It needs to be fixed in order for the pipeline to continue.

Old

Old component

Components with teal border have ran in previous executions, not the current one. These components will not run when the pipeline is resumed.

Fresh

Fresh component

Components with green border have ran in the current execution. These components will not run when the pipeline is resumed.

Invalidated

Invalidated component

Components with grey border are invalidated by a change in the pipeline. This can be caused by a change in configuration or connection of a component preceding the invalidated component in the execution. A component can also be invalidated manually by the control. These components will run again when the pipeline is resumed.

Component debug controls

When a component is clicked in the pipeline, its controls show up. The control invalidates the old and fresh components and all components following it in the execution order. The control runs the pipeline up to that component and then stops. The control disables the component. A disabled component behaves as if it is not in the pipeline when a pipeline is executed. However, it keeps all the configuration and can be enabled again at any time. This is useful for debugging queries and configurations of a single component.

Execution detail

Once an Execution is finished, you may want to inspect it in more detail. One way is browsing the intermediary data passed in the pipeline in the graphical overview. Another way is by clicking on Execution detail in the menu. There, you can inspect messages from the execution in the form of a list.

In the Execution detail view, you can see a list of data units attached to executed components. The individual data units are named to reflect their expected content. You can open a data unit and see the files inside via FTP. Soon, we will add the option to directly query RDF data units using SPARQL.

Debug view

Personalization

At the moment, we have one option for personalization which can be set in the Personalization page. You can choose whether you will start at the Pipeline list screen or the Executions list screen when you come to your LP-ETL instance. This setting is stored as a cookie in your web browser.

Personalization

Overview
Pipeline development workflow
Debugging support
Execution detail
Sharing options
Personalization

Overview

Pipelines

Executions

Pipeline development workflow

Component compatibility

Component tags

Recommended by your usage

Debugging support

Resume failed pipeline execution

Graphical execution overview

Component state type

Component debug controls

Execution detail

Sharing options

Pipeline download

Pipeline import

Pipeline fragments

Personalization