XLSX (Excel) Spreadsheets
You can read and write XLSX (Microsoft® Excel) spreadsheet files in your ETL dataflow with:
- Source worker: XlsxSource<TOutput>
- Target worker and factory: XlsxTarget<TInput>, XlsxTargetFactory
This allows you to create Excel ETL applications, fully integrated with all the other actionETL dataflow and control flow capabilities.
It uses the https://github.com/JanKallman/EPPlus library via the EPPlus NuGet package version 4.x (which is free) to process the XLSX (Excel) files. No driver installation is required, which avoids otherwise common issues with 32-bit vs. 64-bit drivers.
Apart from this article, also see the ReadSortWriteXlsx Sample.
Example
In this simple Excel ETL application we read dataflow rows from one XLSX (Excel) file and write them to a new XLSX file. You can easily add transformations to the dataflow rows before writing them out.
The row class specifies possible columns to read and write, and column names to optionally map to:
using System;
public class Budget
{
public DateTime Month { get; set; }
public decimal OfflineRevenue { get; set; }
public decimal OnlineRevenue { get; set; }
}
The worker system reads all row columns from the top left corner of the first sheet in an XLSX (Excel) file, and writes to a different XLSX file, starting at cell "B2" on sheet "Brief", overwriting any existing file:
using actionETL;
using actionETL.EPPlus;
public static partial class XlsxReadWrite
{
public static SystemOutcomeStatus Run()
{
return new WorkerSystem()
.Root(root =>
{
new XlsxSource<Budget>(root, $"Read XLSX"
, @"Src/XlsxReadWrite/Budget.xlsx"
, xsc => xsc.FromAllByName())
// Add any dataflow transformations here
.Output.Link.XlsxTarget("Write XLSX"
, xtc => xtc.Address("Brief!B2").FromAll()
, @"Src/XlsxReadWrite/BriefBudget.xlsx"
, XlsxTargetWriteOption.Overwrite);
})
.Start();
}
}
Note
- Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match.
- Mapping commands that can map multiple columns ignore already mapped columns.
Use the
Name(string, string)
command to map the same column multiple times.
XlsxSource Configuration
The constructor parameter xlsxSourceCommands and the XlsxSourceCommands property defines mappings between the XLSX (Excel) spreadsheet file source columns and downstream row columns. Note that multiple commands can be chained together, e.g.:
xsc => xsc
.Address("'My Sheet'!B5")
.ToNames("Category", "Subcategory")
.Name("Id", "ProductId").Ignore()
IXlsxSourceCommand methods detail all the mapping commands.
It supports reading from a rectangular area on one sheet in the XLSX (Excel) file. The default address is the top left cell (i.e. "A1") on the first sheet.
If the HeaderRow property is false or the column header is missing, use ordinal position or source column addresses ("B", "AD" etc.) to specify which columns to import.
Note that some of the mapping commands explicitly control in which order to read and populate columns, while others use OrderAttribute added to the row members. You can mix the two approaches.
By default, any column errors are rejected to the ErrorOutput.
Furthermore, XlsxSource
properties allow e.g.
providing a file password.
XlsxTarget Configuration
The factory method parameter xlsxTargetCommands) and the XlsxTargetCommands property defines mappings between upstream row columns and the XLSX (Excel) spreadsheet file target columns, where to write, and what formatting to use. Note that multiple commands can be chained together, e.g.:
xtc => xtc
.Address("'My Sheet'!B5")
.FromNames("Category", "Subcategory")
.Name("Date").Format("yyyy-mm-dd")
IXlsxTargetCommand methods detail all the mapping commands.
It supports writing to a rectangular area on one sheet in the XLSX (Excel) file. The default address is the top left cell (i.e. "A1") on the first sheet. If there is no existing sheet, a new sheet called "Sheet1" will be created.
Note that some of the mapping commands explicitly control in which order to write columns, while others use OrderAttribute added to the row members. You can mix the two approaches.
Furthermore, XlsxTarget
properties allow e.g.
providing a file password,
and specifying an XLSX (Excel) file to use as a template file.
Supported Data Types
The following types are supported as dataflow column types when reading and writing XLSX (Excel) files:
.NET Type Display Name | Nullable Type |
---|---|
System.Boolean | System.Boolean? |
System.Byte | System.Byte? |
System.DateTime | System.DateTime? |
System.Decimal | System.Decimal? |
System.Double | System.Double? |
System.Int16 | System.Int16? |
System.Int32 | System.Int32? |
System.Int64 | System.Int64? |
System.SByte | System.SByte? |
System.Single | System.Single? |
System.String | |
System.UInt16 | System.UInt16? |
System.UInt32 | System.UInt32? |
System.UInt64 | System.UInt64? |