Converting tabular data to RDF: Convert Excel to CSV

The first step of the data transformation is converting the Excel spreadsheet into CSV. We start by downloading the spreadsheet using the HTTP GET component of LP-ETL. To add a component to the pipeline's canvas, right-click the canvas and select the option to insert a component. A menu appears that lists the available components. You can quickly filter the components by typing a few characters of their names. We pipe the downloaded file into the Excel to CSV component. You can pipe a component's output by dragging an edge from the component's output port to the target component's input port. Port are displayed as circles on the sides of the components. By convention, the ports on the left are inputs and the ports on the right are outputs. You can also drop the dragged edge onto an empty canvas, in which case LP-ETL suggests the components that have an input compatible with the edge.

There are several options that we can configure in the Excel to CSV component. In this case, we convert the Excel spreadsheet into multiple CSV files; one for each sheet. However, the spreadsheet contains not only sheets with LAU codes, but also sheets with explanatory notes. Fortunately, each sheet with LAU codes is named with a two-letter country code. This allows us to select the sheets of interest by using the regular expression [A-Z]{2} in the Sheet filter option. We can also use the sheet names to distinguish the names of the produced CSV files. We can refer to the sheet name via the {SHEET} variable in the Output file template option. We enter lau_{SHEET}.csv as the value of this option, so that files like lau_DE.csv or lau_CZ.csv are produced.

In order to be able to differentiate the countries in the later steps of the transformation, we switch on the Virtual columns have header and the Include sheet name as additional column options, which append the sheet names as an additional sheet_name column in the produced CSV files. Similarly, we turn on the Skip empty rows option, since the spreadsheet contains unnecessary empty rows. The spreadsheet also features formulas left in by omission, so we enable the Evaluate formulas option to turn them into data. Without having this option switched on, the component would raise an error alerting us about formulas present in the spreadsheet, which is likely the way you would find about them during the iterative pipeline development.

If you change an option and do not want to re-run the whole pipeline, you can re-run a transformation step by clicking the Debug to this component button on the reconfigured component. When the pipeline execution completes, you can click through to its execution view and click the output of the Excel to CSV component to view the generated CSV files.

List of executions

When you click the output port of the Excel to CSV component a listing of the CSV files produced by the component is shown.

Viewing one of the files reveals the table structure. For example, in the `lau_FI.csv` file for Finnish LAU, you can find the following rows:

NUTS_3	LAU1_NAT_CODE	LAU2_NAT_CODE	CHANGE	NAME_1	NAME_2_LAT	POP	AREA	sheet_name
FI194	146	005	no	Alajärvi	Alajärvi	10006	1056740000	FI
FI196	041	050	no	Eura	Eura	12128	630200000	FI
FI193	131	592	no	Petäjävesi	Petäjävesi	4008	495420000	FI

The pipeline to replicate this transformation step can be found here.

← Back to the introduction

→ Next step