Tabular

Transformer. Used to transform CSV files to RDF data according to the Generating RDF from Tabular Data on the Web W3C Recommendation. The first set of parameters is there simply do deal with CSV files not compliant with RFC 4180, which specifies a comma as a delimiter, the double-quote as a quote character, and UTF-8 as the encoding.

Delimiter: Character separating columns in a row in non-standard CSV files. For tabulator (used as delimiter in TSV files), enter \t.
Quote: Character used as quote in case the column value contains the column separator
Encoding: Encoding of the CSV file

The second group of parameters deals with the transformation itself.

Table has header row: Character separating columns in a row
Trim whitespaces from cells: Removes leading and trailing whitespaces from cells
Table IRI prefix: Switches between file:// and file:/// prefix for output table IRI
Default resource IRI template: An RFC 6570 IRI template. E.g., http://ex.org/{COLUMN_WITH_ID}. {$ROW_NUMBER$} can be used here for row number.
Rows skipped (after header): Number of rows to be skipped during transformation
Rows limit: Number of rows to be processed during transformation
Rows skipped (before header): Number of lines to be skipped in the file before starting the transformation. Useful for blank, formatted or merged lines added typically for printing purposes in Excel files
Table and row entities: When checked, entities for rows and the table are generated. Otherwise, only raw row data is generated
Generate full mapping: When checked, uses the default mapping specified by the Recommendation. Otherwise, lets the user specify the mapping manually
Use IRI base: When checked, lets the user specify the IRI base for IRIs of rows and properties used to attach column value to the row entity. Otherwise, the IRIs are generated according to the Recommendation, which includes a file name with full path, which is random in LinkedPipes ETL and makes it hard to process the resulting data
IRI base: The IRI base for row and property IRIs

All values from the input CSV files are represented as strings, which may be later transformed to numbers e.g. using SPARQL Update. The advanced row mapping will be documented later. Basically, it allows the user to directly specify output literal data type, language tag or specify that the output value is an IRI. In addition, it allows to specify a custom predicate IRI based on column name.

Characteristics

ID: t-tabular
Type: transformer
Inputs: Files
Outputs: RDF single graph
Look in pipeline

The Tabular component takes the input CSV files and transforms them to an RDF representation according to the Generating RDF from Tabular Data on the Web W3C Recommendation. Not all aspects are covered by the recommendation, there we add our own functionality. This is one of the most complex components we have in LP-ETL. One of the reasons is that tabular data is the most common on the web and therefore we had to cover the most situations. For larger data, which can be processed e.g. row by row in the pipeline, consider using the Tabular chunked version of the component.

We recommend not to use the custom mapping section of the component. Instead, we recommend using the default transformation and adjusting the resulting RDF representation using SPARQL Construct and SPARQL Update.