Getting Started
The steps below show how to quickly get up and running with actionETL, and start writing ETL (Extract, Transform, Load) applications.
Alternatively, use the abridged instructions in the Common Tasks article.
Requirements
The library is available as the actionETL and actionETL.templates.dotnet packages from nuget.org (see An Introduction to NuGet), making it very easy to get started and to keep your ETL applications up to date. You can deploy both on-premises and in the cloud.
The library targets the following frameworks:
actionETL targets | Your project can target |
---|---|
.NET Standard 2.0 (Linux or Windows) |
- .NET Core 2.1+ - .NET 5+ - .NET Standard 2.0+ - .NET Framework 4.7.2+ (non-default) |
.NET Framework 4.6.1 (Windows only) |
.NET Framework 4.6.1+ |
In this article we'll use .NET Core 3.1 for our examples. For full details, see System Requirements.
Install or Update Dotnet Templates
To allow creating actionETL projects using
dotnet new
templates, each developer should:
- Ensure .NET Core SDK (v2.1 or later) is installed on your development computer, see Install the .NET Core SDK
Install or update the actionETL templates from nuget.org:
dotnet new --install actionETL.templates.dotnet
For full details, see the Dotnet Templates article.
actionETL License
actionETL requires a license to run. Please get a FREE Community license, a free trial license or purchase a commercial license.
Also see the Licensing article.
First actionETL Application
The steps below use the actionETL project template to create and run a simple console program to detect if files exist. The template example implementation:
- Checks if a file exists
- Takes the filename from a configuration file
- Handles errors and exceptions in a sensible way
- Logs the result to the console and a log file
- Exits with a success (zero) or failure (non-zero) exit code
It's quite small, and demonstrates the benefits of the comprehensive logging, error handling, and configuration facilities that actionETL provides out of the box.
1. Create Project
Create a new .NET Core 3.1 project actionETL console project from scratch, taking its name from the output directory "my-etl":
dotnet new actionetl.console --output my-etl
Important
The template adds an empty "actionetl.license.json" license file. Add a FREE Community license, a free 30-day trial license, or a commercial license into this file.
The template also adds the following default "Program.cs":
using actionETL;
using System.Threading.Tasks;
namespace actionetl.console.csharp
{
static class Program
{
static async Task Main()
{
// Create worker system: test if file exists
var workerSystem = new WorkerSystem()
.Root(ws =>
{
// Example worker: check filename loaded from "actionetl.aconfig.json"
_ = new FileExistsWorker(ws, "File exists", ws.Config["TriggerFile"]);
});
// Run the worker system
var systemOutcomeStatus = await workerSystem.StartAsync().ConfigureAwait(false);
// Exit with success or failure code
systemOutcomeStatus.Exit();
}
}
}
And also the (optional) configuration file "actionetl.aconfig.json" - if running on Linux, change the "TriggerFile" configuration appropriately. Note the forward slash to avoid escaping:
{
"configurations": [
{ "TriggerFile": "C:/Temp/Trigger.trg" }
]
}
As well as the (optional) nlog.config logging configuration file.
Note
See Dotnet Templates and Add actionETL Manually for more templates and how to add actionETL to existing projects, including other project types (Windows Forms etc.) A console application is however often appropriate for a typical batch oriented ETL application.
2. Build and Run Project
Building and running the project creates the executable ".../my-etl/bin/Debug/my-etl.exe", together with all other configuration and assembly files required to run the application:
cd my-etl
dotnet run
- The application should start and log its progress to the console. Since the specified file doesn't exist, the error will be logged and highlighted (and a failure exit code will have been returned, which e.g. a batch script could check):
DateTime | Level | Locator | Category | Message
2020-08-19 16:17:58.9159 | INFO | /Root | System.Status.Created | Type=actionETL.WorkerSystem
2020-08-19 16:17:59.0239 | INFO | /Root | System.Environment | HostName='Boom' UserName='Kristian' CWD='C:\Projects\MinimalFileExists\bin\AnyCPU\Release\net461' ProcessorCount=4 AvailablePhysicalMemBytes=4479717376 TotalPhysicalMemBytes=17105559552 GarbageCollection=Batch,Server Framework='.NET Framework 4.8' CompilePlatform=AnyCPU OsBits=64 ProcessBits=64 OsVersion='Microsoft Windows NT 6.2.9200.0' CurrentCulture='en-GB' UTC='2020-08-19T15:17:58 UTC (+00)' LibraryVersion=0.36.0.5050 MinorVersionReleaseDate=2020-08-19 CreationGuid=b936494c-b70d-49ed-a294-fcd02366c3f6
2020-08-19 16:17:59.2827 | INFO | /Root | System.Status.Running |
2020-08-19 16:17:59.2941 | ERROR | /Root | File.Operation.Failed | Missing C:\Temp\Trigger.trg
2020-08-19 16:17:59.3641 | INFO | /Root | System.Statistics.Ending | Started=2020-08-19T15:17:59Z Completed=2020-08-19T15:17:59Z Duration=0:00:00:00.0154456 WorkerTypesUsed=0 PeakPagedMemBytes=58544128 PeakVirtualMemBytes=22817902592 PeakWorkingSetBytes=43118592 UserProcessorTime=00:00:00.625 TotalProcessorTime=00:00:00.781
2020-08-19 16:17:59.3641 | ERROR | /Root | System.Status.Completed.Error
If you create an empty file with the specified filename, and rerun the application, the application will succeed.
Note
In actionETL many predictable exceptions (database timeouts, file not found as in this example etc.) are caught and a succinct and specific log message is logged. Exceptions with stack traces are however logged for unpredictable exceptions, and when passing incorrect parameters to workers etc.
Also see Deploying Applications and Development Guidelines.
Log File
actionETL performs extensive logging (by default using NLog under the hood), which can be configured, customized, and redirected to many different outputs, including other logging systems.
By default, as in this example, log messages are sent to the console, and (more verbosely) to a file with the ".log" extension: my-etl.log.
See Logging for more details.
Key Points
In actionETL, all work is performed by workers in a hierarchical worker system. In the example,
FileExistsWorker
is a child of the (root) parentWorkerSystem
:Most ETL applications will use a single worker system (often with many child workers). Running other (non actionETL) work in parallel, as well as running multiple worker systems in parallel, both within a single application and across multiple applications, is also supported.
There are many specialized workers for querying databases, manipulating files etc., including Dataflow (or pipeline) workers.
Worker starting order can be controlled with grouping and start constraints. Dataflow workers have input, output, and/or error output ports. In the documentation, this is depicted as follows:
In the earlier example, the worker system is started and awaited asynchronously with StartAsync() and await. The returned OutcomeStatus describes the success or failure of the worker system. In this example we then call Exit() to exit the program with either a zero (success) or non-zero (failure) exit code.
Note
In console programs (but few other project types), the worker system can instead be started synchronously (i.e. without
async
andawait
) by using Start().A so called lambda is used to specify what the WorkerSystem (and several workers) should do. This is usually the preferred approach (instead of defining a regular method) if the code snippet is only used in a single place and is not too extensive, since it makes it easy to follow the overall logic.
Lambdas can take parameters that the code snippet can use; in the below example, the
WorkerSystem
itself is passed in to my code snippet via a parameter I gave the namews
, which I then use to create theFileExistsWorker
:ws => { new FileExistsWorker(ws, "File exists", ws.Config["TriggerFile"]); }
It is good practice to use the Configuration facilities to store filenames, connection strings etc.