Workers are the main building block in actionETL, and the different types of workers are configured and combined into a Worker System to do the required processing of files, database access, rows, etc.
- Composing existing workers, i.e. encapsulating one or more workers to create a new worker
- Deriving a new worker from an existing one, and add the required custom functionality
Furthermore, a number of the out of box workers provide the bulk of the implementation, but allow you to provide just your specific functionality via a callback (e.g. a code snippet via an Action or a Func<T>). This approach makes the workers very flexible and easy to adapt to different requirements, e.g.:
- In the dataflow
*JoinMerge*workers, equi joins (i.e. just using one or more equal signs:
A.Id = B.Id) can be specified by simply listing the column names. You can however instead specify the join condition as a callback, allowing non-equi joins (
A.Id = B.Id && A.StartDate > B.StartDate && A.Status <> 'D').
- Columns to sort on in
SortTransformcan be specified just by listing their name; for more complicated sort criteria a callback can be provided inline that calculates an ordering value from the column values, removing the need to precalculate it in a separate worker, or to add it to the data rows.
Workers are created and executed in a parent-child hierarchy. A WorkerSystem instance is always at the root of the hierarchy, with an arbitrary number of descendant workers and levels. In this example:
- The six rectangles represent one worker system instance, and five distinct worker instances
- Execution order and duration is:
WorkerSystem, until none of its children are running or can be started
ActionWorker(until it's three children have completed)
Workers can have high speed data ports for sending many millions of rows of data per second between workers, creating a dataflow. A worker is called a:
- Dataflow worker if it has one or more data ports (a.k.a. dataflow components in some ETL tools)
- Non-dataflow worker if it does not have data ports (a.k.a. control flow tasks in some ETL tools)
In actionETL there is no fundamental difference between the two types, and they can be mixed and matched as needed, e.g.:
- A single worker can be created to perform both dataflow and non-dataflow tasks
- Dataflow workers and non-dataflow workers can contain both dataflow and non-dataflow child workers
- Worker start constraints can be set between dataflow and non-dataflow workers
These capabilities provide great flexibility in how applications are designed, and also remove the need to create multiple "dataflows" just to intersperse "control flow tasks" between them. This reduces both complexity and the need for temporarily staging data to disk.
Please see Dataflow for more details.
Workers can be further categorized via their names, and it is best practice to also name custom workers using these conventions:
|*Action*||Library user provides the bulk of the logic via a callback (often Action or Func<T> lambda), e.g. ActionWorker. "Action" should come immediately before "Worker", "Source", "Transform", or "Target" in the worker name.|
|Adb*||Workers accessing an SQL database using the
|*Base||Workers that must be extended by being inherited from, e.g.
|*Row*||Dataflow worker that processes a single row at a time, e.g. RowActionTarget<TInput>|
|*Rows*||Dataflow worker that processes multiple rows at a time, e.g. RowsTargetBase<TInput>|
|*Source or *SourceBase||Dataflow sources have output ports but no input ports, e.g. EnumerableSource<TOutput>|
|*Target or *TargetBase||Dataflow targets have input ports but no regular data output ports (they might have error output ports), e.g. AdbInsertTarget<TInputError>|
|*Transform or *TransformBase||Dataflow transforms have both input and regular data output ports, e.g. SortTransform<TInputOutput>|
|*Worker or *WorkerBase||Non-dataflow workers, e.g. CopyFileWorker|
List of Workers
The following articles list all out of box workers: