Tabular
Transformer. Used to transform CSV files to RDF data according to the Generating RDF from Tabular Data on the Web W3C Recommendation. The first set of parameters is there simply do deal with CSV files not compliant with RFC 4180, which specifies a comma as a delimiter, the double-quote as a quote character, and UTF-8 as the encoding.
- Delimiter
- Character separating columns in a row in non-standard CSV files. For tabulator (used as delimiter in TSV files), enter
\t
. - Quote
- Character used as quote in case the column value contains the column separator
- Encoding
- Encoding of the CSV file
The second group of parameters deals with the transformation itself.
- Table has header row
- Character separating columns in a row
- Trim whitespaces from cells
- Removes leading and trailing whitespaces from cells
- Table IRI prefix
- Switches between
file://
andfile:///
prefix for output table IRI - Default resource IRI template
- An RFC 6570 IRI template. E.g.,
http://ex.org/{COLUMN_WITH_ID}
.{$ROW_NUMBER$}
can be used here for row number. - Rows skipped (after header)
- Number of rows to be skipped during transformation
- Rows limit
- Number of rows to be processed during transformation
- Rows skipped (before header)
- Number of lines to be skipped in the file before starting the transformation. Useful for blank, formatted or merged lines added typically for printing purposes in Excel files
- Table and row entities
- When checked, entities for rows and the table are generated. Otherwise, only raw row data is generated
- Generate full mapping
- When checked, uses the default mapping specified by the Recommendation. Otherwise, lets the user specify the mapping manually
- Use IRI base
- When checked, lets the user specify the IRI base for IRIs of rows and properties used to attach column value to the row entity. Otherwise, the IRIs are generated according to the Recommendation, which includes a file name with full path, which is random in LinkedPipes ETL and makes it hard to process the resulting data
- IRI base
- The IRI base for row and property IRIs
All values from the input CSV files are represented as strings, which may be later transformed to numbers e.g. using SPARQL Update. The advanced row mapping will be documented later. Basically, it allows the user to directly specify output literal data type, language tag or specify that the output value is an IRI. In addition, it allows to specify a custom predicate IRI based on column name.
Characteristics
- ID
- t-tabular
- Type
- transformer
- Inputs
- Files
- Outputs
- RDF single graph
- Look in pipeline
The Tabular component takes the input CSV files and transforms them to an RDF representation according to the Generating RDF from Tabular Data on the Web W3C Recommendation. Not all aspects are covered by the recommendation, there we add our own functionality. This is one of the most complex components we have in LP-ETL. One of the reasons is that tabular data is the most common on the web and therefore we had to cover the most situations. For larger data, which can be processed e.g. row by row in the pipeline, consider using the Tabular chunked version of the component.
We recommend not to use the custom mapping section of the component. Instead, we recommend using the default transformation and adjusting the resulting RDF representation using SPARQL Construct and SPARQL Update.