Namespace actionETL
This is the main actionETL namespace.
Classes
ActionSource<TOutput>
A dataflow worker which executes a callback once, which in turn calls methods on one Output
port to pass data
to the downstream worker.
Also see the ActionSource
example.
Note: This class only has overloads for asynchronous callbacks, since using synchronous ones would require the callback to block a thread when there is no downstream demand, which is not appropriate.
ActionSource<TOutput, TError>
A dataflow worker which executes a callback once that calls methods on one Output
port
and one ErrorOutput
port to pass data to downstream workers.
Also see the ActionSource example.
Note: This class only has overloads for asynchronous callbacks, since using synchronous ones would require the callback to block a thread when there is no downstream demand, which is not appropriate.
ActionTarget<TInput>
A dataflow worker which executes an asynchronous callback once, which in turn calls
methods on the Input
port to consume rows from the upstream worker.
The target does not have any ErrorOutput
port.
Note: Use the factory methods in ActionTargetFactory to create instances of this class.
Also see the ActionTarget example.
Note: The input port uses the Default policy. Consider whether this is appropriate, or should be changed, see BufferingMode for further details.
Note: This class only has overloads for asynchronous callbacks, since using synchronous ones would require the callback to block a thread when there are no upstream rows available, which is not appropriate.
ActionTarget<TInput, TError>
A dataflow worker which executes an asynchronous callback once, which in turn calls
methods on the Input
port to consume rows from the upstream worker.
The target also has an ErrorOutput
port.
Note: Use the factory methods in ActionTargetFactory to create instances of this class.
Also see the ActionTarget example.
Note: The input port uses the Default policy. Consider whether this is appropriate, or should be changed, see BufferingMode for further details.
Note: This class only has overloads for asynchronous callbacks, since using synchronous ones would require the callback to block a thread when there are no upstream rows available, which is not appropriate.
ActionTargetFactory
Factory methods that create an
ActionTarget<TInput> or
ActionTarget<TInput, TError>
dataflow worker, which runs an asynchronous callback once, which in turn calls
methods on the Input
port to consume rows from the upstream worker.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
Also see the ActionTarget example.
Note: This class only has overloads for asynchronous callbacks, since using synchronous ones would require the callback to block a thread when there are no upstream rows available, which is not appropriate.
ActionTransform<TInputOutputError>
A dataflow worker that executes an asynchronous callback once, which in turn consumes data from the upstream worker and sends data to the downstream worker. Input, output, and error output rows have the same type (instead use ActionTransform<TInputError, TOutput> if input and output rows have different types).
Note: Use the factory methods in ActionTransformFactory to create instances of this class.
Input data can be processed row by row via TakeRow() etc., or multiple rows can be processed via TryTakeBuffer(TInput[]) etc.
The input port uses the Default policy. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Also see the ActionTransform example.
ActionTransform<TInputError, TOutput>
A dataflow worker that executes an asynchronous callback once, which in turn consumes data from the upstream worker and sends data to the downstream worker. Input and error output rows have the same type, but output rows have a different type (instead use ActionTransform<TInputOutputError> if input and output rows are of the same type).
Note: Use the factory methods in ActionTransformFactory to create instances of this class.
Input data can be processed row by row via TakeRow() etc., or multiple rows can be processed via TryTakeBuffer(TInput[]) etc.
The input port uses the Default policy. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Also see the ActionTransform example.
ActionTransformFactory
Factory methods that create an ActionTransform<TInputOutputError> or ActionTransform<TInputError, TOutput> dataflow worker that executes an asynchronous callback once, which in turn consumes data from the upstream worker and sends data to the downstream worker.
Input data can be processed row by row via TakeRow() etc., or multiple rows can be processed via TryTakeBuffer(TInput[]) etc.
The input port uses the Default policy. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Also see the ActionTransform example.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
ActionTwoInputTransform<TLeftInput, TRightInput, TOutput>
A dataflow worker that executes an asynchronous callback once, which in turn consumes data from two upstream workers and sends data to the downstream worker. Both input ports and the output port can have rows of different types.
Note: Use the factory methods in ActionTwoInputTransformFactory to create instances of this class.
Input data can be processed row by row via TakeRow() etc., or multiple rows can be processed via TryTakeBuffer(TInput[]) etc.
Note that the two inputs by default provide Full buffering of incoming data, which potentially can consume a large amount of memory. See BufferingMode for further details.
If one or two ErrorOutput
ports are needed, either inherit from this class or from
TwoInputTransformBase<TDerived, TLeftInput, TRightInput, TOutput>, and add them explicitly.
Also see the ActionTransform example, which works in a similar way to this class.
ActionTwoInputTransformFactory
Factory methods that create an ActionTwoInputTransform<TLeftInput, TRightInput, TOutput> dataflow worker that executes an asynchronous callback once, which in turn consumes data from the upstream workers and sends data to the downstream worker. Both input ports and the output port can have rows of different types.
Input data can be processed row by row via TakeRow() etc., or multiple rows can be processed via TryTakeBuffer(TInput[]) etc.
Note that the two inputs by default provide Full buffering of incoming data, which potentially can consume a large amount of memory. See BufferingMode for further details.
Also see the ActionTransform example, which works in a similar way to these methods.
The LeftInput
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
ActionWorker
A worker which accepts a callback to specify what it should do, which can include creating child workers.
It can also be configured as a worker that always fails, which is sometimes useful:
new ActionWorker(parent, "Always Fail", aw => OutcomeStatus.Error);
This class is sealed. ActionWorkerBase<TDerived> is available to derive from.
ActionWorkerBase<TDerived>
An abstract worker which accepts a callback to specify what it should do, which can include creating child workers.
This class must be inherited. The most common use case is to create a custom worker
where the user supplies a callback that the custom worker will execute, and the
custom worker also overrides RunAsync() to add additional logic.
Note that when overriding this RunAsync()
, the base class
base.RunAsync()
must be called.
AggregateTransform<TInputAccumulateOutput>
A dataflow worker that uses callbacks to aggregate incoming rows, and optionally an equality comparer to group incoming rows. It outputs at most one row for all input rows or per unique grouping. The input rows, the accumulation, and the output row all have the same type.
Note: Use the factory methods in AggregateTransformFactory to create instances of this class.
Also see the examples in Dataflow Aggregations,
which cover IAggregationCommand predefined column aggregations (Sum
,
Average
etc.), RowAggregationFunction predefined row aggregations
(First
, Last
etc.), IGroupByCommand and IGroupByCopyCommand
GROUP BY commands, as well as creating custom aggregations and groupings.
This worker is fully blocking, i.e. it will only output row(s) after it has received all incoming rows. It will buffer (and therefore consume memory for) only a single accumulation (without grouping), or multiple accumulations corresponding to the number of unique groupings.
AggregateTransform<TInput, TAccumulateOutput>
A dataflow worker that uses callbacks to aggregate incoming rows, and optionally an equality comparer to group incoming rows. It outputs at most one row for all input rows or per unique grouping. The accumulation, and the output rows (if any) both have the same type, but input rows can have a different type. The output row(s), if any, will be taken from the final value of the Accumulation.
Note: Use the factory methods in AggregateTransformFactory to create instances of this class.
Also see the examples in Dataflow Aggregations,
which cover IAggregationCommand predefined column aggregations (Sum
,
Average
etc.), RowAggregationFunction predefined row aggregations
(First
, Last
etc.), IGroupByCommand and IGroupByCopyCommand
GROUP BY commands, as well as creating custom aggregations and groupings.
This worker is fully blocking, i.e. it will only output row(s) after it has received all incoming rows. It will buffer (and therefore consume memory for) only a single accumulation (without grouping), or multiple accumulations corresponding to the number of unique groupings.
AggregateTransform<TInput, TAccumulate, TOutput>
A dataflow worker that uses callbacks to aggregate incoming rows, and optionally an equality comparer to group incoming rows. It outputs at most one row for all input rows or per unique grouping. The input rows, the accumulation, and the output row (if any) can all have different types.
Note: Use the factory methods in AggregateTransformFactory to create instances of this class. Also see the examples in Dataflow Aggregations.
Also see the examples in Dataflow Aggregations,
which cover IAggregationCommand predefined column aggregations (Sum
,
Average
etc.), RowAggregationFunction predefined row aggregations
(First
, Last
etc.), IGroupByCommand and IGroupByCopyCommand
GROUP BY commands, as well as creating custom aggregations and groupings.
This worker is fully blocking, i.e. it will only output row(s) after it has received all incoming rows. It will buffer (and therefore consume memory for) only a single accumulation (without grouping), or multiple accumulations corresponding to the number of unique groupings.
AggregateTransformFactory
Factory methods that create aggregation and grouping dataflow workers, which aggregate and (optionally) group incoming rows, and output at most one row for all input rows or per unique grouping. Also see the examples in Dataflow Aggregations.
Aggregations can be specified in the following ways:
-
Predefined column aggregations
Average
,Count
,CountDistinct
,CountRows
,First
,Last
,Max
,Min
andSum
. Also see IAggregationCommand for details. -
Predefined row aggregations such as
First
,Last
,Single
etc. Also see RowAggregationFunction for details. - Custom seed, accumulation and output callbacks to implement custom aggregations not provided by the predefined aggregation functions.
Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.
The incoming rows can optionally also be grouped (i.e. perform a GROUP BY
),
by either specifying which columns to group by, or by providing either a grouping key function,
or a row equality comparer.
Also note that:
-
AggregateTransform1()
overloads create AggregateTransform<TInputAccumulateOutput> workers, where input, accumulation and output types are the same. -
AggregateTransform2()
overloads create AggregateTransform<TInput, TAccumulateOutput> workers, where the input type can be different from the accumulation and output types. -
AggregateTransform3()
overloads create AggregateTransform<TInput, TAccumulate, TOutput> workers, where the input, accumulation and output types can all be different.
These workers are fully blocking, i.e. they will only output the row(s) after they have received all incoming rows. They will buffer (and therefore consume memory for) only a single accumulation (without grouping), or multiple accumulations corresponding to the number of unique groupings.
Note that by default on .NET Framework, maximum array size is 2GB, which in turn with a 64-bit application limits the number of unique groupings to a maximum of 47.9 million, and the CountDistinct(String) number of distinct values per column to a maximum of just over 89 million. You can remove these limits by enabling support for larger arrays, as described in <gcAllowVeryLargeObjects> Element.
.NET Core and .NET 5+ on the other hand supports >2GB arrays by default.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
CollectionTarget<TInput>
A dataflow worker with one Input
that consumes incoming rows and adds them to
the Rows List<T> collection.
Note: Use the factory methods in CollectionTargetFactory to create instances of this class.
Also note that the collection will use enough memory to hold all incoming rows.
CollectionTarget<TInput, TCollection>
A dataflow worker with one Input
, that consumes incoming rows and adds them to
the Rows ICollection<T>, which by default is a List<T>.
Note: Use the factory methods in CollectionTargetFactory to create instances of this class.
Also note that the collection will use enough memory to hold all incoming rows.
CollectionTargetFactory
Factory methods that create a CollectionTarget<TInput> or CollectionTarget<TInput, TCollection> dataflow worker, which consumes incoming rows and adds them to an ICollection<T>.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
ConvertExpressionInfo
A class for creating Expression tree implicit conversions between two types.
This is useful when creating high performance delegates at runtime using expression trees.
E.g. IAggregationCommand and AdbDataReaderSource{TOutput}
use this to implicitly convert between incoming and outgoing data types.
You can also use it directly in your own implementations.
Use Create(Type, Type) to create an instance that describes whether an implicit conversion can be made (HasImplicitConversion), and also can rewrite an existing expression to apply the conversion if supported (Apply(Expression)).
Conversions are based on C# implicit conversions, where common ones are supported, while others are not supported or not applicable, see Create(Type, Type) for details.
CopyFileWorker
A worker that copies a file, including between different file system volumes.
CreateFileWorker
A worker that creates an empty file.
CrossJoinTransform<TLeftInput, TRightInput, TOutput>
A dataflow worker with two input ports and one Output
port, that performs a Cross Join
on the two inputs.
Note: Use the factory methods in CrossJoinTransformFactory to create instances of this class.
This worker is
fully blocking
on the RightInput
input port, and will buffer all its rows.
Therefore, to conserve memory, link RightInput
to the upstream
output where the rows will consume the least amount of memory (i.e. the smallest
numberOfRows * memoryPerRow).
CrossJoinTransformFactory
Factory methods that create a
CrossJoinTransform<TLeftInput, TRightInput, TOutput>
dataflow worker, with two input ports and one Output
port, that performs a
Cross Join on the two inputs.
This worker is
fully blocking
on the RightInput
input port, and will buffer all its rows.
Therefore, to conserve memory, link RightInput
to the upstream
output where the rows will consume the least amount of memory (i.e. the smallest
numberOfRows * memoryPerRow).
The "first" LeftInput
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
DeleteFileWorker
A worker that deletes a file.
DictionaryLookupSplitTransform<TInputOutputError, TKey, TValue>
A dataflow worker that performs a lookup in an IReadOnlyDictionary<TKey,TValue> collection for each Input row, optionally modifying the input rows, before sending them to the appropriate output port: FoundOutput, NotFoundOutput, or ErrorOutput. All ports have the same row type.
Note: Use the factory methods in DictionaryLookupSplitTransformFactory to create instances of this class.
To customize the key lookup, e.g. to make a case insensitive lookup, either add code to the
selectRowKeyFunc
callback to process the row data to match the case of the lookup
reference keys, or create and set the underlying Dictionary as a case insensitive
one (see e.g. Dictionary<TKey,TValue>(IEqualityComparer<TKey>)).
By supplying a pre-populated dictionary, the worker by default performs a fully cached lookup, which is both the most common configuration, and the easiest to configure.
It is however also possible to implement a partially cached lookup by starting with an empty dictionary,
or a partially pre-populated dictionary, and then add missing dictionary items on the fly
in the notFoundKeyFunc
callback. This avoids loading dictionary items that will
never be used, which can be advantageous when it is impractical to retrieve all keys
and lookup values ahead of time.
Note that multiple rows can often match the same lookup key and value. To avoid issues where modifying one row inadvertently also changes another row, best practice is to make the lookup value only consist of value types and/or immutable types. If the lookup value is, or contains, a mutable reference type, the user must ensure that either there are no lookup value references that are shared and modified across rows, or that the lookup value is cloned, so that each row gets its own unique instance.
Also see DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>, which loads the dictionary from a second input port, and DictionaryLookupTransform<TInputOutputError, TKey, TValue>, which does not have a dedicated output port for unmatched rows.
Also see Dataflow Lookups.
DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>
A dataflow worker that first loads an IDictionary<TKey,TValue> from the
DictionaryInput
port rows, then performs a lookup in the dictionary for
each Input row, optionally modifying the input rows, before sending them to
the appropriate output port: FoundOutput, NotFoundOutput,
or ErrorOutput.
All ports except DictionaryInput
have the same row type.
Note: Use the factory methods in DictionaryLookupSplitTransformFactory to create instances of this class.
To customize the key lookup, e.g. to make a case insensitive lookup, either add code to the
selectRowKeyFunc
callback to process the row data to match the case of the lookup
reference keys, or create and set the underlying Dictionary as a case insensitive
one (see e.g. Dictionary<TKey,TValue>(IEqualityComparer<TKey>)).
The dictionary is populated from DictionaryInput
ahead of processing Input
rows,
and is therefore by default a fully cached lookup, which is both the most common configuration,
and the easiest to configure.
It is however also possible to implement a partially cached lookup by only loading commonly used
dictionary items in bulk via DictionaryInput
, and then add missing dictionary items on the fly
in the notFoundKeyFunc
callback. This avoids loading dictionary items that will
never be used, which can be advantageous when it is impractical to retrieve all keys
and lookup values ahead of time.
In this scenario, create and pass the dictionary to the worker, and use
this original reference directly in the notFoundKeyFunc
callback.
Note that multiple rows can often match the same lookup key and value. To avoid issues where modifying one row inadvertently also changes another row, best practice is to make the lookup value only consist of value types and/or immutable types. If the lookup value is, or contains, a mutable reference type, the user must ensure that either there are no lookup value references that are shared and modified across rows, or that the lookup value is cloned, so that each row gets its own unique instance.
Note that while the DictionaryInput
rows can be of any type; consider using the
MutableKeyValue<TKey, TValue> helper class when you only need a
mutable key and value, or KeyValuePair<TKey,TValue>
when an immutable pair is sufficient.
Also see DictionaryLookupSplitTransform<TInputOutputError, TKey, TValue>,
which doesn't have the DictionaryInput
port, and
DictionaryLookupTransform<TInputOutputError, TDictionaryInput, TKey, TValue>,
which sends both rows where the key is found, and rows where the key is not found,
to the same output port.
Also see Dataflow Lookups.
DictionaryLookupSplitTransformFactory
Factory methods that create a DictionaryLookupSplitTransform<TInputOutputError, TKey, TValue> or a DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue> dataflow worker, which performs a lookup of a key in an dictionary, and uses the dictionary values to modify the rows before sending them to one of the outputs.
Note that by default on .NET Framework, maximum array size is 2GB, which in turn limits the Dictionary<TKey,TValue> size with a 64-bit application (using references as key and value) to a maximum of 47.9 million items. You can remove this limit by enabling support for larger arrays, as described in <gcAllowVeryLargeObjects> Element.
.NET Core and .NET 5+ on the other hand supports >2GB arrays by default.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
Also see Dataflow Lookups.
DictionaryLookupTransform<TInputOutputError, TKey, TValue>
A dataflow worker that performs a lookup in an IReadOnlyDictionary<TKey,TValue> collection for each Input row, optionally modifying the input rows, before sending them to the output or error port. All ports have the same row type.
Note: Use the factory methods in DictionaryLookupTransformFactory to create instances of this class.
To customize the key lookup, e.g. to make a case insensitive lookup, either add code to the
selectRowKeyFunc
callback to process the row data to match the case of the lookup
reference keys, or create and set the underlying Dictionary as a case insensitive
one (see e.g. Dictionary<TKey,TValue>(IEqualityComparer<TKey>)).
By supplying a pre-populated dictionary, the worker by default performs a fully cached lookup, which is both the most common configuration, and the easiest to configure.
It is however also possible to implement a partially cached lookup by starting with an empty dictionary,
or a partially pre-populated dictionary, and then add missing dictionary items on the fly
in the notFoundKeyFunc
callback. This avoids loading dictionary items that will
never be used, which can be advantageous when it is impractical to retrieve all keys
and lookup values ahead of time.
Note that multiple rows can often match the same lookup key and value. To avoid issues where modifying one row inadvertently also changes another row, best practice is to make the lookup value only consist of value types and/or immutable types. If the lookup value is, or contains, a mutable reference type, the user must ensure that either there are no lookup value references that are shared and modified across rows, or that the lookup value is cloned, so that each row gets its own unique instance.
Also see DictionaryLookupTransform<TInputOutputError, TDictionaryInput, TKey, TValue>, which loads the dictionary from a second input port, and DictionaryLookupSplitTransform<TInputOutputError, TKey, TValue>, which has a separate output port for unmatched rows.
Also see Dataflow Lookups.
DictionaryLookupTransform<TInputOutputError, TDictionaryInput, TKey, TValue>
A dataflow worker that first loads an IDictionary<TKey,TValue> from the
DictionaryInput port rows, then performs a lookup in the dictionary for
each Input row, optionally modifying the input rows, before sending them to
the output or error port.
All ports except DictionaryInput
have the same row type.
Note: Use the factory methods in DictionaryLookupTransformFactory to create instances of this class.
To customize the key lookup, e.g. to make a case insensitive lookup, either add code to the
selectRowKeyFunc
callback to process the row data to match the case of the lookup
reference keys, or create and set the underlying Dictionary as a case insensitive
one (see e.g. Dictionary<TKey,TValue>(IEqualityComparer<TKey>)).
The dictionary is populated from DictionaryInput
ahead of processing Input
rows,
and is therefore by default a fully cached lookup, which is both the most common configuration,
and the easiest to configure.
It is however also possible to implement a partially cached lookup by only loading commonly used
dictionary items in bulk via DictionaryInput
, and then add missing dictionary items on the fly
in the notFoundKeyFunc
callback. This avoids loading dictionary items that will
never be used, which can be advantageous when it is impractical to retrieve all keys
and lookup values ahead of time.
In this scenario, create and pass the dictionary to the worker, and use
this original reference directly in the notFoundKeyFunc
callback.
Note that multiple rows can often match the same lookup key and value. To avoid issues where modifying one row inadvertently also changes another row, best practice is to make the lookup value only consist of value types and/or immutable types. If the lookup value is, or contains, a mutable reference type, the user must ensure that either there are no lookup value references that are shared and modified across rows, or that the lookup value is cloned, so that each row gets its own unique instance.
Note that while the DictionaryInput
rows can be of any type; consider using the
MutableKeyValue<TKey, TValue> helper class when you only need a
mutable key and value, or KeyValuePair<TKey,TValue>
when an immutable pair is sufficient.
Also see DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>, which sends unmatched rows to a second output port, and DictionaryLookupTransform<TInputOutputError, TKey, TValue>, which does not have the DictionaryInput port.
Also see Dataflow Lookups.
DictionaryLookupTransformFactory
Factory methods that create a DictionaryLookupTransform<TInputOutputError, TKey, TValue> or a DictionaryLookupTransform<TInputOutputError, TDictionaryInput, TKey, TValue> dataflow worker, which performs a lookup of a key in an dictionary, and uses the dictionary values to modify the rows before sending them to one of the outputs.
Note that by default on .NET Framework, maximum array size is 2GB, which in turn limits the Dictionary<TKey,TValue> size with a 64-bit application (using references as key and value) to a maximum of 47.9 million items. You can remove this limit by enabling support for larger arrays, as described in <gcAllowVeryLargeObjects> Element.
.NET Core and .NET 5+ on the other hand supports >2GB arrays by default.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
Also see Dataflow Lookups.
DictionaryTarget<TInput, TKey, TValue>
A dataflow target that populates an IDictionary<TKey,TValue>, by default a Dictionary<TKey,TValue>.
The input rows can be of any type, and a function selects the key and value from each row. When the row only needs a key and value, consider using the immutable KeyValuePair<TKey,TValue> struct, or the mutable MutableKeyValue<TKey, TValue> class.
Note: Use the factory methods in DictionaryTargetFactory to create instances of this class.
DictionaryTargetFactory
Factory methods that create a DictionaryTarget<TInput, TKey, TValue> dataflow worker, which consumes incoming rows and adds selected values to an IDictionary<TKey,TValue>.
Note that by default on .NET Framework, maximum array size is 2GB, which in turn limits the Dictionary<TKey,TValue> size with a 64-bit application (using references as key and value) to a maximum of 47.9 million items. You can remove this limit by enabling support for larger arrays, as described in <gcAllowVeryLargeObjects> Element.
.NET Core and .NET 5+ on the other hand supports >2GB arrays by default.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
EnumerableSource<TOutput>
A dataflow worker that consumes rows from an IEnumerable<T>, or individual rows specified in the constructor, and passes the rows to a downstream worker.
Providing an IEnumerable<T> makes it easy to consume the rows from methods such as ReadLines(String) as well as from LINQ queries.
ErrorOutputPort<TError>
Generic class for dataflow error output ports. Create these ports with Create<TError>(String), or use a dataflow worker that already has the appropriate error output port(s) added.
Note that error output ports will always have Full
buffering,
since this simplifies outputting error rows.
To avoid running out of memory where a very large number of error rows
might be produced, limit the number of allowed error rows (which is also the
default) using MaxRowsSent
and/or MaxRowsBeforeError.
Always use SendErrorRow(...) overloads instead of LogRowError(...) overloads where possible (since the port can be left unlinked anyway), except when there is no meaningful error row available.
Note: As per normal, unless otherwise noted, the instance members are not thread-safe, and should only be used from the worker that the port belongs to.
See Dataflow Row Errors for further details.
Note that each logged column is truncated to 256 characters if needed. Rows sent to a downstream worker will however retain their original format and length.
ErrorOutputPortCollection
A collection of all error output ports for a worker, always available as ErrorOutputs.
This class is used when you either don't need to know the row type of each port, or you cast the port(s) to have the actual row type (using an explicit cast or ToArray<TErrorOutput>()).
Note that workers with ports typically add additional members that allow access to typed versions of the ports.
ExceptionExtensions
Static extension helper methods on Exception.
ExecuteProcessWorker
A worker that launches an external process, e.g. an executable, a batch script, or a document.
On successful completion, the exit code from the process is stored in ExitCode.
By default, a zero exit code will give the worker a Succeeded status,
any other exit code will give it an Error
status.
See the Process and ProcessStartInfo classes for background information.
FileExistsWorker
A worker that checks if a file exists.
FlatRowSchema
Provides information about the columns in an external data source, where only the column names, number, and order are available. This is useful for working with columns, e.g. for implementing column mapping directly, or for building higher level constructs such as FromTypeRowMapper<TFrom> and ToTypeRowMapper<TTo>.
Use TypeRowSchema instead when you have access to the required .NET CLR type, since it can populate the instance automatically from the type. Otherwise use this class and specify columns explicitly.
Columns can be searched for via several Get*
methods. They can also be accessed as a
flat list via the SchemaNodeList property,
and as a hierarchical tree via the SchemaNodeRoot
property, both of which return items in a well defined order controlled by OrderAttribute.
Note that the root node of the tree is not included in the SchemaNodeList
list.
The hierarchical tree and flat list includes all descendant fields and properties, although for this class the tree only has a single level below the root.
ForEachActionWorker<TItem>
A worker that executes a callback once for each item in an IEnumerable<T>. The callback can optionally create child workers, which will be removed after each iteration, except on the last iteration, allowing them to be inspected.
Less commonly, child workers can also be added (typically by the worker parent) before the worker runs, in which case they will run on the first iteration (if any), but not on later iterations.
Any failure status returned from the user callback or escalated from worker children
will halt the iteration. One can suppress Error
escalation on selected child workers
(by setting EscalateError to false), e.g. to implement retry-on-error
functionality.
Note that if a dedicated parent worker is not needed (e.g. to group child workers), and all
child workers can be created before any child worker is started,
it is simpler to instead use a regular foreach
statement to perform the work,
e.g. creating child workers. See
Instantiating Workers via Looping
for details.
FromTypeColumnMapper<TFrom>
Defines mappings from the (potentially hierarchical) fields, properties, table columns etc. of a source 'from' .NET CLR type (described by FromRowSchema), to a flat list of user supplied or automatically generated target 'to' columns (described by ToRowSchema).
This facility is commonly used by dataflow workers for copying and renaming column data between input fields and properties on the one hand, and external data sources on the other hand.
Members to be mapped can be specified explicitly via name or index (e.g. copy from
input column "A" to the fourth output column), as well as implicitly based on name (i.e.
AutoName()
). A specified field or property on the 'from' side can reside inside a
column schema, i.e. a struct
that groups multiple fields and properties.
This class does not include the Row() method,
and should be used when the Row()
method is not supported by the worker in question.
Also see IRowMapperCommand which does include that method.
Mappings can be specified with the columnMapperCommandAction
parameter, or added by calling
the mapping methods Name(String) etc. The mapping result is available in
Mappings, which the caller can use to manually perform any tasks, such as
copy columns.
Also see the Dataflow Column Mapping and Mapping and Copying examples, as well as IRowMapperCommand and IColumnMapperCommand.
Note: If the mapping commands result in duplicate mappings, an exception will be thrown. Resolve this by either of:
- Specify additional or all name parts in explicit mapping
- Change the column (or schema) names in the from and/or to rows to avoid name clashes
- Restrict auto-mapping to specific column schemas
- Map offending columns explicitly by name (which will exclude them from later auto-mappings)
FromTypeColumnMappings
The output from a FromTypeColumnMapper<TFrom> row mapper, which maps from the (potentially hierarchical) fields, properties, table columns etc. of a source 'from' .NET CLR type (described by FromRowSchema), to a flat list of user supplied or automatically generated target 'to' columns (described by ToRowSchema).
This class does not include the Row() method,
and should be used when the Row()
method is not supported by the worker in question.
Also see IRowMapperCommand which does include that method.
Also see the Dataflow Column Mapping and Mapping and Copying examples, as well as IRowMapperCommand and IColumnMapperCommand.
FromTypeRowMapper<TFrom>
Defines mappings from the (potentially hierarchical) fields, properties, table columns etc. of a source 'from' .NET CLR type (described by FromRowSchema), to a flat list of user supplied or automatically generated target 'to' columns (described by ToRowSchema).
This facility is commonly used by dataflow workers for copying and renaming column data between input fields and properties on the one hand, and external data sources on the other hand.
Members to be mapped can be specified explicitly via name or index (e.g. copy from
input column "A" to the fourth output column), as well as implicitly based on name (i.e.
AutoName()
). A specified field or property on the 'from' side can reside inside a
column schema, i.e. a struct
that groups multiple fields and properties.
This class does include the Row() method, also see IColumnMapperCommand which excludes that method.
Mappings can be specified with the rowMapperCommandAction
parameter, or added by calling
the mapping methods Name(String) etc. The mapping result is available in
Mappings, which the caller can use to manually perform any tasks, such as
copy columns.
Also see the Dataflow Column Mapping and Mapping and Copying examples, as well as IRowMapperCommand and IColumnMapperCommand.
Note: If the mapping commands result in duplicate mappings, an exception will be thrown. Resolve this by either of:
- Specify additional or all name parts in explicit mapping
- Change the column (or schema) names in the from and/or to rows to avoid name clashes
- Restrict auto-mapping to specific column schemas
- Map offending columns explicitly by name (which will exclude them from later auto-mappings)
FromTypeRowMappings
The output from a row mapper that maps from (potentially hierarchical) columns in a .NET CLR data type (described by FromRowSchema), to a flat list of columns (described by ToRowSchema). This is typically created by FromTypeRowMapper<TFrom>.
Note that it does support the Row() command, and that it is mutually exclusive with commands mapping individual columns. Use FromTypeColumnMappings instead if that is not needed.
Also see the Dataflow Column Mapping and Mapping and Copying examples.
FullJoinMergeSortedTransform<TLeftInput, TRightInput, TOutput>
A dataflow worker with two input ports and one Output
port, that performs a
Full Merge-join on the two presorted inputs.
Note: Use the factory methods in FullJoinMergeSortedTransformFactory to create instances of this class.
This worker is
partially blocking
on the RightInput
input port:
when a join is found, it will buffer all rows from the RightInput
port that compare equal
to the join rows. This consumes "Number of RightInput rows comparing equal" multiplied by
"Memory per RightInput row". To conserve memory, link RightInput
to the upstream
output with the expected smallest memory consumption.
FullJoinMergeSortedTransformFactory
Factory methods that create a
FullJoinMergeSortedTransform<TLeftInput, TRightInput, TOutput>
dataflow worker, with two input ports and one Output
port, that performs a
Full Merge-join on the two presorted inputs.
Note that both inputs must be presorted according to the order specified by these overloads, including collation order for textual columns, where in the order any nulls appear, and if nulls are equal to each other or not etc.
This worker is
partially blocking
on the RightInput
input port:
when a join is found, it will buffer all rows from the RightInput
port that compare equal
to the join rows. This consumes "Number of RightInput rows comparing equal" multiplied by
"Memory per RightInput row". To conserve memory, link RightInput
to the upstream
output with the expected smallest memory consumption.
The "first" LeftInput
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
InnerJoinMergeSortedTransform<TLeftInput, TRightInput, TOutput>
A dataflow worker with two input ports and one Output
port, that performs a
Inner Merge-join on the two presorted inputs.
Note: Use the factory methods in InnerJoinMergeSortedTransformFactory to create instances of this class.
This worker is
partially blocking
on the RightInput
input port:
when a join is found, it will buffer all rows from the RightInput
port that compare equal
to the join rows. This consumes "Number of RightInput rows comparing equal" multiplied by
"Memory per RightInput row". To conserve memory, link RightInput
to the upstream
output with the expected smallest memory consumption.
InnerJoinMergeSortedTransformFactory
Factory methods that create a
InnerJoinMergeSortedTransform<TLeftInput, TRightInput, TOutput>
dataflow worker, with two input ports and one Output
port, that performs a
Inner Merge-join on the two presorted inputs.
Note that both inputs must be presorted according to the order specified by these overloads, including collation order for textual columns, where in the order any nulls appear, and if nulls are equal to each other or not etc.
This worker is
partially blocking
on the RightInput
input port:
when a join is found, it will buffer all rows from the RightInput
port that compare equal
to the join rows. This consumes "Number of RightInput rows comparing equal" multiplied by
"Memory per RightInput row". To conserve memory, link RightInput
to the upstream
output with the expected smallest memory consumption.
The "first" LeftInput
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
InputPort
Non-generic base class for dataflow input ports, containing the functionality that doesn't depend on the row type.
The library user does not derive new port classes directly from
InputPort
, but instead creates or uses instances of the derived class
InputPort<TInput>.
Note: As per normal, unless otherwise noted, the instance members are not thread-safe, and should only be accessed from the worker that the port belongs to.
InputPort<TInput>
Generic class for dataflow input ports, containing the functionality that depends on the type of the rows.
The library user can use this class via workers that already have input ports present, as well as create and add input ports to workers by using Create<TInput>(String, OutputPortBase<TInput>).
The library user does not derive new port classes from this class.
Note: As per normal, unless otherwise noted, the instance members are not thread-safe, and should only be used from the worker that the port belongs to.
InputPortCollection
A collection of all input ports for a worker, always available as Inputs.
This class is used when you either don't need to know the row type of each port, or you cast the port(s) to have the actual row type (using an explicit cast or ToArray<TInput>()).
Note that workers with ports typically add additional members that allow access to typed versions of the ports, and workers with an unknown number of input ports of the same type typically adds an InputPortCollection<TInput>.
InputPortCollection<TInput>
A collection of all typed input ports for a worker, used by workers with an unknown number of input ports of the same type, see e.g. TypedInputs.
Note that this class only provides members where the row type is relevant; other members are available via the Inputs property of the InputPortCollection class.
Also note that in rare cases, a worker using this class could also have additional input ports with a different row type. This is perfectly fine, as long as only the ports with the correct row type are accessed via this class.
InputPortStateExtensions
Static helper methods for InputPortState.
JoinMergeSortedTransformBase<TDerived, TLeftInput, TRightInput, TOutput>
An abstract dataflow worker with two input ports and one Output
port, that performs a
merge-join on the two presorted inputs (except for a cross join which does not
require sorted inputs).The row order on the inputs must match the ordering specified to the
constructor.
Note that both inputs must be presorted according to the order specified by the comparison
function (provided by the rowComparerCommandAction
parameter or the Comparison
property),
including collation order for textual columns, where in the order any null
s appear and
if null
s are equal to each other or not.
Non-cross-joins are
partially blocking
on the RightInput
input port:
when a join is found, it will buffer all rows from the RightInput
port that compare equal
to the join rows. This consumes "Number of RightInput rows comparing equal" multiplied by
"Memory per RightInput row". To conserve memory, link RightInput
to the upstream
output with the expected smallest memory consumption.
Cross-joins are instead fully blocking on the RightInput
input port,
and will buffer all rows from the RightInput
port.
Note that this base class is not used directly by library users. Instead use CrossJoinTransformFactory, FullJoinMergeSortedTransformFactory, InnerJoinMergeSortedTransformFactory, LeftJoinMergeSortedTransformFactory, or RightJoinMergeSortedTransformFactory.
LeftJoinMergeSortedTransform<TLeftInput, TRightInput, TOutput>
A dataflow worker with two input ports and one Output
port, that performs a
Left Merge-join on the two presorted inputs.
Note: Use the factory methods in LeftJoinMergeSortedTransformFactory to create instances of this class.
This worker is
partially blocking
on the RightInput
input port:
when a join is found, it will buffer all rows from the RightInput
port that compare equal
to the join rows. This consumes "Number of RightInput rows comparing equal" multiplied by
"Memory per RightInput row". To conserve memory, link RightInput
to the upstream
output with the expected smallest memory consumption.
LeftJoinMergeSortedTransformFactory
Factory methods that create a
LeftJoinMergeSortedTransform<TLeftInput, TRightInput, TOutput>
dataflow worker, with two input ports and one Output
port, that performs a
Left Merge-join on the two presorted inputs.
Note that both inputs must be presorted according to the order specified by these overloads, including collation order for textual columns, where in the order any nulls appear, and if nulls are equal to each other or not etc.
This worker is
partially blocking
on the RightInput
input port:
when a join is found, it will buffer all rows from the RightInput
port that compare equal
to the join rows. This consumes "Number of RightInput rows comparing equal" multiplied by
"Memory per RightInput row". To conserve memory, link RightInput
to the upstream
output with the expected smallest memory consumption.
The "first" LeftInput
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
MergeSortedTransform<TInputOutput>
A dataflow worker that merges multiple presorted inputs of the same type into a single sorted output. Any duplicates are preserved.
Note: Use the factory methods in MergeSortedTransformFactory to create instances of this class.
Note: To concatenate the input rows, i.e. to exhaust each input fully before forwarding any rows from other inputs, in a predefined order, ensure that the sort key range for each input does not overlap with the sort key range for any other input, and that the sort key values matches your desired input port row order. One simple option is to add a constant integer property to each input row type, and use that as the sort key.
TInputOutput
must implement IComparable<T>, and
the default comparer will be used.
Note that the inputs provide Full buffering of incoming data, and the worker is partially blocking, which potentially can consume a large amount of memory. This is required to avoid potential deadlocks. See BufferingMode for further details.
Note that input ports can be added after the worker is created (but before it or its siblings have started) by repeatedly calling Create<TInput>(String, OutputPortBase<TInput>). This can be particularly useful when generating workers in a loop.
MergeSortedTransformFactory
Factory methods that create a MergeSortedTransform<TInputOutput> dataflow worker, which merges multiple presorted inputs of the same type into a single sorted output. Any duplicates are preserved.
Note: The row ordering specified in these overloads must match exactly with the incoming row order for each port.
Note: To concatenate the input rows, i.e. to exhaust each input fully before forwarding any rows from other inputs, in a predefined order, ensure that the sort key range for each input does not overlap with the sort key range for any other input, and that the sort key values matches your desired input port row order. One simple option is to add a constant integer property to each input row type, and use that as the sort key.
Note that the inputs provide Full buffering of incoming data, and the worker is partially blocking, which potentially can consume a large amount of memory. This is required to avoid potential deadlocks. See BufferingMode for further details.
Note that input ports can be added after the worker is created (but before it or its siblings have started) by repeatedly calling Create<TInput>(String, OutputPortBase<TInput>). This can be particularly useful when generating workers in a loop.
The "first" input port (i.e. TypedInputs[0]
) is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
MoveFileWorker
A worker that moves a file within a file system volume.
MulticastTransform<TInputOutput>
A dataflow worker with one input port and one or more output ports, all of the same type, that sends all input rows to all output ports.
With a single output port, the incoming rows are simply forwarded to the output port.
Note: Use the factory methods in MulticastTransformFactory to create instances of this class.
Note that output ports can be added after the worker is created (but before it or its siblings have started) by repeatedly calling Create<TOutput>(String). This can be particularly useful when generating workers in a loop.
MulticastTransformFactory
Factory methods that create a MulticastTransform<TInputOutput> dataflow worker, with one input port and one or more output ports, all of the same type, that sends all input rows to all output ports.
With a single output port, the incoming rows are simply forwarded to the output port.
Note that output ports can be added after the worker is created (but before it or its siblings have started) by repeatedly calling Create<TOutput>(String). This can be particularly useful when generating workers in a loop.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
MutableKeyValue
Factory class that creates MutableKeyValue<TKey, TValue> mutable key/value pair instances, where the key and value can be both set and retrieved.
These instances are mainly used as the dictionary input of DictionaryLookupTransform<TInputOutputError, TDictionaryInput, TKey, TValue> and DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>.
When mutability is not needed, consider using KeyValuePair<TKey,TValue>.
MutableKeyValue<TKey, TValue>
Defines a mutable key/value pair, where the key and value can be both set and retrieved. Create instances of this class either via the constructors, or via the factory method Create<TKey, TValue>(TKey, TValue).
This class is mainly used as the dictionary input of DictionaryLookupTransform<TInputOutputError, TDictionaryInput, TKey, TValue> and DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>.
When mutability is not needed, consider using KeyValuePair<TKey,TValue>.
OrderAttribute
An attribute to place on dataflow row columns (i.e. fields and properties), to control the order of columns for certain column order and mapping facilities, using an ordering index:
OrderAttribute
with index: The index is explicitly provided. It must be
in the range:
[0, 9999], [20000,29999], [40000,49999], or [60000,69999].
OrderAttribute
without index: The member will automatically get an
index and be ordered on the declaration order in the source file.
This allows managing the order by simply moving the members around in the row class.
Base class members will come before derived class members, and for any
partial
classes, the members are first sorted on the filename from
CallerFilePathAttribute. By default
they will get an index in the range [10000,19999], or in the range [50000,59999]
if setting afterAlphabetical
= true
.
No OrderAttribute
: These members will automatically get an index
in the [30000,39999] range, in member name (ordinal case sensitive)
alphabetical order.
You can use GetFieldsAndPropertiesInOrder(Type, BindingFlags) to retrieve this ordering without column schema columns, when e.g. implementing custom workers.
You can also use TypeRowSchema to retrieve this ordering with
column schema columns. The columns in a column schema are kept together as a group,
are ordered according to the OrderAttribute
rules within the group, with
the whole group placed in the overall order where their parent member is placed
(i.e. change the position of the whole group by changing the position of the
parent member).
OutcomeStatus
Describes a success or failure outcome, e.g. of a completed worker or port.
Note: If a DbException is passed to the static factory methods, database specific exception details will be automatically added to the Message property.
OutcomeStatusResult
A factory class that provides static methods for creating
OutcomeStatusResult<TResult> immutable struct
values,
which are typically used (to avoid out
parameters) when returning an
OutcomeStatus plus one more result from a method.
Note that you must check that Status
is Succeeded
before using the
Result property.
OutputPort<TOutput>
Generic class for dataflow (non-error) output ports. Create these ports with Create<TOutput>(String), or use a dataflow worker that already has the appropriate port(s) added.
Note: As per normal, unless otherwise noted, the instance members are not thread-safe, and should only be used from the worker that the port belongs to.
OutputPortBase
Non-generic base class for dataflow data output and error output ports, containing the functionality that doesn't depend on the row type.
The library user does not derive new port classes directly from
OutputPortBase
, but instead creates or uses instances of the derived class
OutputPort<TOutput> and ErrorOutputPort<TError>.
Note: As per normal, unless otherwise noted, the instance members are not thread-safe, and should only be accessed from the worker that the port belongs to.
OutputPortBase<TOutput>
Generic base class for dataflow ports, containing the functionality that depends on the type of the rows.
The library user only uses this class via OutputPort<TOutput> and ErrorOutputPort<TError>.
Note: As per normal, unless otherwise noted, the instance members are not thread-safe, and should only be used from the worker that the port belongs to.
OutputPortBaseCollection
Base class for a collection of all (either only output or only error output) ports for a worker. Note that these are the 'untyped' base class ports, which are useful for performing all operations that do not require the type of the row.
Note that workers with ports also add additional members to the worker that allow access to the typed versions of the ports, which do include the row type.
OutputPortBaseStateExtensions
Static helper methods for OutputPortBaseState.
OutputPortCollection
A collection of all (non-error) output ports for a worker, always available as Outputs.
This class is used when you either don't need to know the row type of each output port, or you cast the port(s) to have the actual row type (using an explicit cast or ToArray<TOutput>()).
Note that workers with ports typically add additional members that allow access to typed versions of the ports, and workers with an unknown number of output ports of the same type typically adds a OutputPortCollection<TOutput>.
OutputPortCollection<TOutput>
A collection of all (non-error) typed output ports for a worker, used by workers with an unknown number of output ports of the same type, see e.g. TypedOutputs.
Note that this class only provides members where the row type is relevant; other members are available via the Outputs property of the OutputPortCollection class.
Also note that in rare cases, a worker using this class could also have additional output ports with a different row type. This is perfectly fine, as long as only the ports with the correct row type are accessed via this class.
PortPassThroughSource<TOutput>
A dataflow worker that passes rows from an upstream input port belonging to a different worker
(normally residing under a different parent), to the Output
port on this worker.
This is mainly used when encapsulating a child dataflow worker with an input port
(i.e. a transform or target) inside a parent dataflow worker, and passing the parent
input port rows to the child input port. This allows using dataflow (and other) workers as
building blocks when creating new dataflow workers.
The parent worker can have any number of input and output ports, as well as any number of child workers,
and would add a PortPassThroughSource
for each port to pass from parent to a child worker, and a
PortPassThroughTarget<TInputOutput> for each port to pass from a child worker to the parent.
Use both PortPassThroughSource
and PortPassThroughTarget
to create a parent transform.
Note that the parent must await the completion of the PortPassThroughSource
worker before it itself completes,
otherwise the parent input port will not be completely emptied and will raise an exception.
Please also see the Compose Target with Pass-through example.
PortPassThroughTarget<TInputOutput>
A dataflow worker that passes rows from the Input
port on this worker, to an output port
on a downstream worker (normally residing under a different parent compared to this worker).
This is mainly used when encapsulating a child dataflow worker with an output port (i.e. a
source or transform) inside a parent dataflow worker, and passing the child output rows
to the parent output port. This allows using dataflow (and other) workers as
building blocks when creating new dataflow workers.
The parent worker can have any number of input and output ports, as well as any number of child workers,
and would add a PortPassThroughSource<TOutput> for each port to pass from parent to
child worker, and a PortPassThroughTarget
for each port to pass from child to parent worker.
Use both PortPassThroughSource
and PortPassThroughTarget
to create a parent transform.
Note that the parent must await the completion of the PortPassThroughTarget
worker before it
itself completes, otherwise the parent output port will not have completed and will raise an exception.
Please also see the Compose Source with Pass-through example.
PortPassThroughTargetFactory
Factory methods that create a
PortPassThroughTarget<TInputOutput>
dataflow worker, which passes rows from the Input
port on this worker, to an output port
on a downstream worker (normally residing under a different parent compared to this worker).
This is mainly used when encapsulating a child dataflow worker with an output port (i.e. a
source or transform) inside a parent dataflow worker, and passing the child output rows
to the parent output port. This allows using dataflow (and other) workers as
building blocks when creating new dataflow workers.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
ProgressStateExtensions
Extension methods for ProgressState.
ProgressStatus
Describes an execution status, e.g. of a callback function of a worker.
Note: If a DbException is passed to the static factory methods, database specific exception details will be automatically added to the Message property.
ProgressStatusResult
A factory class that provides static methods for creating
ProgressStatusResult<TResult> immutable struct
values,
which are typically used (to avoid out
parameters) when returning an
ProgressStatus plus one more result from a method.
This is e.g. used by
OnOutputRowDemand().
This struct
can also be wrapped in a Task
when returning from asynchronous methods.
Note that you must ensure Status
is NotCompleted
or Succeeded
(or IsFailed
is false
) before using the
Result property.
RepeatRowsSource<TOutput>
A dataflow worker that generates an arbitrary number of rows, often used for
testing, debugging etc. The caller provides template rows as an
IEnumerable
, or as rows specified in the constructor, which are
by default repeatedly cloned (using
CreateDeepCloneFunc()) in a
round-robin fashion, to generate and pass the output rows to a downstream worker.
This way, only the new cloned rows with distinct references are sent downstream.
Alternatively, you can set SendTemplateRows to true
to instead repeatedly send the template rows as is downstream.
Note that if true
and TotalNumberOfRows is larger than
the number of template rows, the same row object reference will be sent multiple
times, which is strongly discouraged unless the row type is immutable,
or you can guarantee that no downstream worker will modify the rows.
See Row Ownership
for details.
By default, all (readable and writable) columns in the template rows are included in the cloning. Use ColumnMapperCommandAction to only clone selected columns.
RightJoinMergeSortedTransform<TLeftInput, TRightInput, TOutput>
A dataflow worker with two input ports and one Output
port, that performs a
Right Merge-join on the two presorted inputs.
Note: Use the factory methods in RightJoinMergeSortedTransformFactory to create instances of this class.
This worker is
partially blocking
on the RightInput
input port:
when a join is found, it will buffer all rows from the RightInput
port that compare equal
to the join rows. This consumes "Number of RightInput rows comparing equal" multiplied by
"Memory per RightInput row". To conserve memory, link RightInput
to the upstream
output with the expected smallest memory consumption.
RightJoinMergeSortedTransformFactory
Factory methods that create a
RightJoinMergeSortedTransform<TLeftInput, TRightInput, TOutput>
dataflow worker, with two input ports and one Output
port, that performs a
Right Merge-join on the two presorted inputs.
Note that both inputs must be presorted according to the order specified by these overloads, including collation order for textual columns, where in the order any nulls appear, and if nulls are equal to each other or not etc.
This worker is
partially blocking
on the RightInput
input port:
when a join is found, it will buffer all rows from the RightInput
port that compare equal
to the join rows. This consumes "Number of RightInput rows comparing equal" multiplied by
"Memory per RightInput row". To conserve memory, link RightInput
to the upstream
output with the expected smallest memory consumption.
The "first" LeftInput
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
RowActionTarget<TInput>
A dataflow worker that executes a callback on each incoming row from the Input
port.
The callback is passed the row as a parameter, and does not itself take rows
from the Input
.
Also see the RowActionTarget example.
Note: The input port uses the Default policy. Consider whether this is appropriate, or should be changed, see BufferingMode for further details.
RowActionTargetFactory
Factory methods that create a
RowActionTarget<TInput>
dataflow worker, which executes a callback on each incoming row from the Input
port.
The callback is passed the row as a parameter, and does not itself take rows
from the Input
.
Also see the RowActionTarget example.
Note: The input port uses the Default policy. Consider whether this is appropriate, or should be changed, see BufferingMode for further details.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
RowActionTransform<TInputOutputError>
A dataflow worker which executes a callback on each row passing through the transform.
The callback is passed the row as a parameter, to read and modify as needed before
it's automatically passed to the downstream worker. The callback does not
itself take rows from or send rows to the ports. If the callback throws an exception,
the input row will be rejected to the ErrorOutput
port.
Note: Use the factory methods in RowActionTransformFactory to create instances of this class.
Use RowActionTransform<TInputError, TOutput> instead if the input and output types are different.
The input port uses the Default policy. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Also see the RowActionTransform example.
Note that this worker does not support asynchronous row methods. If that is needed, consider using RowsActionTransform<TInputOutputError> or RowsTransformBase<TDerived, TInput, TOutput, TError> instead.
To create a transform without any ErrorOutput
, consider inheriting from
RowsTransformBase<TDerived, TInput, TOutput>.
RowActionTransform<TInputError, TOutput>
A dataflow transform worker which executes a callback for each incoming row. The callback is passed the row as a parameter, and the return value controls whether to send zero or one row (which can be of a different type) to the output, or to the error output. When outputting a row, the callback is responsible for either allocating a new output row, or if appropriate cast the input row to the output type, before passing it to the downstream worker.
Note: Use the factory methods in RowActionTransformFactory to create instances of this class.
If the function throws an exception, the input row will be rejected to the
ErrorOutput
port.
Use RowActionTransform<TInputOutputError> instead if the input and output types are the same.
The input port uses the Default policy. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Also see the RowActionTransform example.
Note that this worker does not support asynchronous row methods. If that is needed, consider using RowsActionTransform<TInputError, TOutput> or RowsTransformBase<TDerived, TInput, TOutput, TError> instead.
To create a transform without any ErrorOutput
, consider inheriting from
RowsTransformBase<TDerived, TInput, TOutput>.
RowActionTransformFactory
Factory methods that create a
RowActionTransform<TInputOutputError> or
RowActionTransform<TInputError, TOutput>
dataflow worker, which executes a callback on each row passing through the transform.
The callback does not itself take rows from or send rows to the ports.
If the callback throws an exception, the input row will be rejected to the ErrorOutput
port.
The input port uses the Default policy. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Also see the RowActionTransform example.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
Note that this worker does not support asynchronous row methods. If that is needed, consider using RowsActionTransform<TInputOutputError> or RowsTransformBase<TDerived, TInput, TOutput, TError> instead.
To create a transform without any ErrorOutput
, consider inheriting from
RowsTransformBase<TDerived, TInput, TOutput>.
RowComparer<T>
Creates a Comparison<T> (available in Comparison) and a Comparer<T> (available in Comparer) for comparing two rows of the same type, using high performance generated code.
Use RowComparer<TLeft, TRight> instead if either the rows have different types, or they are the same type and you also need to compare different columns (e.g. column "A" in one row with column "B" in the other row, even though the rows are the same type).
Also see Compare Dataflow Columns.
RowComparer<TLeft, TRight>
Creates a Comparison<TLeft, TRight> (available in Comparison) for comparing two rows of potentially different types, using high performance generated code. Use RowComparer<T> instead if the rows have the same type.
Also see Compare Dataflow Columns.
RowEqualityComparer
A factory class for creating IEqualityComparer<T> instances, which compare two rows of the same type for equality, based on specified columns or on a grouping key function, using high performance generated code.
The created instances are passed to grouping overloads in AggregateTransformFactory, also see the examples in Dataflow Aggregations.
RowEqualityComparer<T>
Creates an IEqualityComparer<T> for comparing two rows of the same type for equality, using high performance generated code. This is used by grouping overloads in AggregateTransformFactory, also see the examples in Dataflow Aggregations.
To create instances of this class that base the comparison on specified columns, use Create<T>(Action<IGroupByCommand>). To base the comparison on a grouping key function, use Create<T, TKey>(Func<T, TKey>).
RowError
Represents an error that occurred when processing a dataflow row. Multiple RowError instances can be used to represent multiple errors from different workers for a particular row.
RowError
instances are created by calling one of the
SendErrorRow(TError, String, Nullable<Int64>, String, Exception)
overloads. Also see RowErrorCollection and CaptureRowErrors.
RowErrorCollection
A list of RowError instances, which represent errors that
occurred when processing a particular dataflow row, with one RowError
instance for each time the row is sent to an error output row of a worker.
The system adds the RowError instances automatically; the user inspects them, and can also remove and clear them.
A RowErrorCollection
instance is created automatically as needed when calling
one of the
SendErrorRow(TError, String, Nullable<Int64>, String, Exception)
overloads if the TError
row type has a field defined as:
public CaptureRowErrors CaptureRowErrors;
.
For example, a target worker might on a row error send the incoming row to its error output. A source worker might instead use StringRowErrors as its error output type, and on a row error send that to its error output, populating RowErrorRow with a string representation of the external source row. A transform worker might use either of the previous strategies, depending on whether outputting the original row or a string representation of the original row is more appropriate.
Warning: Do not modify any exception properties and do not re-throw them after cloning them (e.g. via MulticastTransform<TInputOutput>), since this would lead to other copies also being changed. Exceptions contain mutable (i.e. modifiable) properties, but actionETL treats them as immutable (i.e. not modifiable) when cloning them, since deep copies cannot be reliably done on arbitrary exceptions.
RowsActionSource<TOutput>
A dataflow worker with one output port
which repeatedly executes a function when there is BufferCapacity
demand on the Output
port.
This class allows the developer to write synchronous or asynchronous code as needed,
without having to check for output demand (which simplifies the implementation), as long as
no more than BufferCapacity
rows are sent on each invocation.
Also see the RowsActionSource example.
The user supplied function calls methods on the Output
port to pass data to the
downstream worker, until it returns:
ProgressStatus.NotCompleted
to wait for at least BufferCapacityOutput
demand.ProgressStatus.Succeeded
to stop producing rows. If theOutput
port is not completed, the library will call SendSucceeded() to complete it.ProgressStatus.Error
orProgressStatus.Fatal
to fail the worker. If theOutput
port is not completed, the library will call SendError(String) to complete it.
RowsActionSource<TOutput, TError>
A dataflow worker with one output port and one error output port
which repeatedly executes a function when there is BufferCapacity
demand on the Output
port.
This class allows the developer to write synchronous or asynchronous code as needed,
without having to check for output demand (which simplifies the implementation), as long as
no more than BufferCapacity
rows are sent on each invocation.
Also see the RowsActionSource example.
The user supplied function calls methods on the Output
port to pass data to the
downstream worker, until it returns:
ProgressStatus.NotCompleted
to wait for at least BufferCapacityOutput
demand.ProgressStatus.Succeeded
to stop producing rows. If theOutput
port is not completed, the library will call SendSucceeded() to complete it.ProgressStatus.Error
orProgressStatus.Fatal
to fail the worker. If theOutput
port is not completed, the library will call SendError(String) to complete it.
RowsActionTarget<TInput>
A dataflow worker with one Input
that repeatedly executes a callback when there are rows
to consume from the upstream worker.
Note: Use the factory methods in RowsActionTargetFactory to create instances of this class.
Also see the RowsActionTarget example.
Note: The input port uses the Default policy. Consider whether this is appropriate, or should be changed, see BufferingMode for further details.
RowsActionTargetFactory
Factory methods that create a RowsActionTarget<TInput> dataflow worker, which repeatedly executes a callback when there are rows to consume from the upstream worker.
Also see the RowsActionTarget example.
Note: The input port uses the Default policy. Consider whether this is appropriate, or should be changed, see BufferingMode for further details.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
RowsActionTransform<TInputOutputError>
A dataflow worker which repeatedly executes a callback when there is both rows available from the upstream worker and BufferCapacity demand available from the downstream worker. Input, output and error rows all have the same type (use RowsActionTransform<TInputError, TOutput> if they have different types).
Note: Use the factory methods in RowsActionTransformFactory to create instances of this class.
This class allows the developer to write synchronous or asynchronous code as needed,
without having to check for output demand (which simplifies the implementation), as long as
no more than BufferCapacity
rows are sent on each invocation (unless demand is
checked for explicitly, e.g. via TrySend*
methods).
The input port uses the Default policy. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Also see the RowsActionTransform example.
RowsActionTransform<TInputError, TOutput>
A dataflow worker which repeatedly executes a callback when there is both rows available from the upstream worker and BufferCapacity demand available from the downstream worker. Input and output rows have different types (use RowsActionTransform<TInputOutputError> if they have the same type).
Note: Use the factory methods in RowsActionTransformFactory to create instances of this class.
This class allows the developer to write synchronous or asynchronous code as needed,
without having to check for output demand (which simplifies the implementation), as long as
no more than BufferCapacity
rows are sent on each invocation (unless demand is
checked for explicitly, e.g. via TrySend*
methods).
The input port uses the Default policy. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Also see the RowsActionTransform example.
RowsActionTransformFactory
Factory methods that create a RowsActionTransform<TInputOutputError> or RowsActionTransform<TInputError, TOutput> dataflow worker, which repeatedly executes a callback when there is both rows available from the upstream worker and BufferCapacity demand available from the downstream worker.
These class allows the developer to write synchronous or asynchronous code as needed,
without having to check for output demand (which simplifies the implementation), as long as
no more than BufferCapacity
rows are sent on each invocation (unless demand is
checked for explicitly, e.g. via TrySend*
methods).
The input port uses the Default policy. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Also see the RowsActionTransform example.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
RowSchemaBase<TRowSchema, TSchemaNode>
Abstract base class that contains information about the columns in a dataflow row and their data types, or the columns of an external data source. This is useful for working with column names and their types, e.g. for implementing column mapping directly, or for building higher level constructs such as TypeColumnCopier<TFrom, TTo>.
Use TypeRowSchema when you have access to the required .NET CLR type, since it can populate the instance automatically from the type. Otherwise use FlatRowSchema and specify columns explicitly.
Columns can be searched for via several Get*
methods.
They can also be accessed as a flat list via the SchemaNodeList
property, and as a
hierarchical tree via the SchemaNodeRoot property. Note that the root node of the tree
is not included in the SchemaNodeList list.
The hierarchical tree and flat list includes all descendant fields and properties, down
to either an intrinsically supported column type, e.g. int
, byte[]
,
SqlBinary
etc., or an unsupported type. Each column schema (i.e. a
struct
that is not a supported column type) creates a new parent node in the tree.
Derived classes that has columns without any hierarchy (like FlatRowSchema) must
still populate the tree with a root node and with all columns as direct children of the root.
RowSchemaMapCounts
Allows the caller to track how many times each SchemaNode
in the
SchemaNodeList list property has been mapped.
This is useful since some dataflow types can only be copied once (SingleShallow),
or must be deep copied after the first copy (SingleShallowThenDeep).
RowSourceBase<TDerived, TOutput>
An abstract dataflow worker which repeatedly executes the
OnOutputRowDemand() method when there is demand on the
Output
port. The library user must inherit this class and override
OnOutputRowDemand() to provide custom functionality.
This class simplifies the implementation by allowing the developer to write synchronous code that operates on a single row at a time, without having to check for output demand.
The derived class can additionally override RunAsync()
to add logic that runs before and after all processing of rows, in which case the
base class base.RunAsync()
must be called. The derived class
(or its user) can also use
worker callbacks
to add logic.
Also see the RowSourceBase example.
RowsSourceBase<TDerived, TOutput>
An abstract dataflow worker which repeatedly executes the OnOutputRowsDemandAsync()
method when there is BufferCapacity demand on the Output
port.
The library user must inherit this class and override OnOutputRowsDemandAsync() to
provide custom functionality.
This class allows the developer to write synchronous or asynchronous code as needed,
without having to check for output demand (which simplifies the implementation), as long as
no more than BufferCapacity
rows are sent on each invocation (or demand is checked for
explicitly).
The derived class can additionally override RunAsync() to add logic that runs
before and after all processing of rows, in which case the base class base.RunAsync()
must be called. The derived class (or its user) can also use
worker callbacks
to add logic.
Use RowsSourceBase<TDerived, TOutput, TError> instead if an ErrorOutput
port is needed.
Also see the
RowsSourceBase
example.
RowsSourceBase<TDerived, TOutput, TError>
An abstract dataflow worker which repeatedly executes the
OnOutputRowsDemandAsync()
method when there is BufferCapacity demand on the Output
port.
It also has an ErrorOutput
port. The library user must inherit this class and
override OnOutputRowsDemandAsync
to provide custom functionality.
This class allows the developer to write synchronous or asynchronous code as needed,
without having to check for output demand (which simplifies the implementation, as long as
no more than BufferCapacity
rows are sent on each invocation (or demand is checked for
explicitly).
The derived class can additionally override RunAsync()
to add logic that runs before and after all processing of rows, in which case the base class
base.RunAsync()
must be called. The derived class (or its user) can also use
worker callbacks
to add logic.
Use RowsSourceBase<TDerived, TOutput> instead if an ErrorOutput
port is not needed.
Also see the
RowsSourceBase
example.
RowsTargetBase<TDerived, TInput>
An abstract dataflow worker which repeatedly executes the
OnInputRowsAsync() method when there are input rows
available on the Input
port. The library user must inherit this class and override
OnInputRowsAsync
to provide custom functionality.
This class allows the developer to write synchronous or asynchronous code as needed, without having to check for availability of input rows (which simplifies the implementation), as long as more no rows than are available are taken (typically by taking multiple rows at a time, e.g. via TakeBufferAsync(TInput[])).
The derived class can additionally override RunAsync() to add
logic that runs before and after all processing of rows, in which case the base class
base.RunAsync()
must be called. The derived class (or its user) can also use
worker callbacks
to add logic.
Use RowsTargetBase<TDerived, TInput, TError> instead if an ErrorOutput
port is needed.
Also see the
RowsTargetBase
example.
Note: The input port uses the Default policy. Consider whether this is appropriate, or should be changed, see BufferingMode for further details.
RowsTargetBase<TDerived, TInput, TError>
An abstract dataflow worker which repeatedly executes the
OnInputRowsAsync() method when there are input rows
available on the Input
port. The library user must inherit this class and override
OnInputRowsAsync
to provide custom functionality. It also has an ErrorOutput
port.
This class allows the developer to write synchronous or asynchronous code as needed, without having to check for availability of input rows (which simplifies the implementation), as long as more no rows than are available are taken (typically by taking multiple rows at a time, e.g. via TakeBufferAsync(TInput[])).
The derived class can additionally override RunAsync() to add
logic that runs before and after all processing of rows, in which case the base class
base.RunAsync()
must be called. The derived class (or its user) can also use
worker callbacks
to add logic.
Use RowsTargetBase<TDerived, TInput> instead if an ErrorOutput
port is not needed.
Also see the
RowsTargetBase
example.
Note: The input port uses the Default policy. Consider whether this is appropriate, or should be changed, see BufferingMode for further details.
RowsTransformBase<TDerived, TInput, TOutput>
An abstract dataflow worker which repeatedly executes the OnRowsAndDemandAsync()
method when there is both data to consume from the upstream worker and
BufferCapacity demand available from the downstream worker.
The library user must inherit this class and override
OnRowsAndDemandAsync
to provide custom functionality.
This class allows the developer to write synchronous or asynchronous code as needed,
without having to check for output demand (which simplifies the implementation), as long as
no more than BufferCapacity
rows are sent on each invocation (without checking row
demand explicitly).
The derived class can additionally override RunAsync() to add logic that runs
before and after all processing of rows, in which case the base class base.RunAsync()
must be called. The derived class (or its user) can also use
worker callbacks
to add logic.
The input port uses the Default policy. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Use RowsTransformBase<TDerived, TInput, TOutput, TError> instead if an ErrorOutput
port is needed. Also see the
RowsTransformBase
example.
RowsTransformBase<TDerived, TInput, TOutput, TError>
An abstract dataflow worker which repeatedly executes the
OnRowsAndDemandAsync()
method when there is both data to consume from the upstream worker and
BufferCapacity demand available from the downstream worker.
It also has an ErrorOutput
port.
The library user must inherit this class and override
OnRowsAndDemandAsync
to provide custom functionality.
This class allows the developer to write synchronous or asynchronous code as needed,
without having to check for output demand (which simplifies the implementation), as long as
no more than BufferCapacity
rows are sent on each invocation (without checking row
demand explicitly).
The derived class can additionally override
RunAsync() to add logic that runs
before and after all processing of rows, in which case the base class base.RunAsync()
must be called. The derived class (or its user) can also use
worker callbacks
to add logic.
The input port uses the Default policy. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Use RowsTransformBase<TDerived, TInput, TOutput> instead if an ErrorOutput
port is not needed. Also see the
RowsTransformBase
example.
RowTargetBase<TDerived, TInput>
An abstract dataflow worker which repeatedly executes the
OnInputRow(TInput) method when there is an input row available
on the Input
port. The library user must inherit this class and override OnInputRow
to provide custom functionality.
OnInputRow(TInput) is passed the row as a parameter, and does not
itself take rows from the input port.
If the method throws an exception, the worker will receive a Fatal
status.
This class simplifies the implementation by allowing the developer to write synchronous code that operates on a single row at a time, without having to check for the availability of incoming rows.
The derived class can additionally override RunAsync()
to add logic that runs before and after all processing of rows, in which case the base class
base.RunAsync()
must be called. The derived class (or its user) can also use
worker callbacks
to add logic.
Use RowWithErrorTargetBase<TDerived, TInputError> if an error output port is needed. Also see the RowTargetBase example.
Note: The input port uses the Default policy. Consider whether this is appropriate, or should be changed, see BufferingMode for further details.
RowTransformBase<TDerived, TInputOutputError>
An abstract dataflow worker with an Input
, Output
, and ErrorOutput
port,
all of the same type, which repeatedly executes the OnInputRow(TInputOutputError)
method on each incoming row. The method is passed the row as a parameter, to read and modify
as needed. The method does not itself take rows from or send rows to the ports, the
passed in row is instead automatically handled as per the method return value.
The library user must inherit this class and override OnInputRow
to provide
custom functionality.
If the method throws an exception, the input row will be rejected to the ErrorOutput
port.
The derived class can additionally override RunAsync()
to add logic that runs before and after all processing of rows, in which case the base class
base.RunAsync()
must be called. The derived class (or its user) can also use
worker callbacks to add logic.
The input port uses the Default policy. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Use RowTransformBase<TDerived, TInputError, TOutput> instead if the Output
port
has a different row type.
Also see the
RowTransformBase
example.
RowTransformBase<TDerived, TInputError, TOutput>
An abstract dataflow worker with and Input
, Output
, and ErrorOutput
port,
where Input
and ErrorOutput
have the same type, which repeatedly executes the
OnInputRow(TInputError) method on each incoming row.
The method returns a (usually different) output row to the downstream worker, or no row at all.
When outputting a row, the function is responsible for either allocating
a new output row, or if appropriate cast the input row to the output type,
before passing returning it.
The library user must inherit this class and override OnInputRow
to provide
custom functionality.
If the method throws an exception, the input row will be rejected to the ErrorOutput
port.
The derived class can additionally override RunAsync()
to add logic that runs before and after all processing of rows, in which case the base class
base.RunAsync()
must be called. The derived class (or its user) can also use
worker callbacks to add logic.
The input port uses the Default policy. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Use RowTransformBase<TDerived, TInputOutputError> instead if the all ports have the same row type. Also see the RowTransformBase example.
RowWithErrorTargetBase<TDerived, TInputError>
An abstract dataflow worker which repeatedly executes the
OnInputRow(TInputError) method when there is an input row available
on the Input
port. It also has an ErrorOutput
port of the
same type as the Input
. The library user must inherit this class and override OnInputRow
to provide custom functionality.
OnInputRow(TInputError) is passed the row as a parameter, and does not itself take rows from the input port.
If the method throws an exception, the input row will be rejected to the ErrorOutput
port.
This class simplifies the implementation by allowing the developer to write synchronous code that operates on a single row at a time, without having to check for the availability of incoming rows.
The derived class can additionally override RunAsync()
to add logic that runs before and after all processing of rows, in which case the base class
base.RunAsync()
must be called. The derived class (or its user) can also use
worker callbacks
to add logic.
Use RowTargetBase<TDerived, TInput> if an error output port is not needed. Also see the RowTargetBase example.
Note: The input port uses the Default policy. Consider whether this is appropriate, or should be changed, see BufferingMode for further details.
RuntimeInfo
Runtime information about the current operating system and .NET framework. Use it for logging purposes and to select different code paths.
SafeWorkerParentValue<T>
A thread-safe wrapper for custom worker and worker system properties that support setting and getting values, and will also throw if the worker or worker system has an unexpected (configurable) state (e.g. that it has already been started) when the value is accessed. Use this to ensure safe access by other threads and to disallow accessing unpopulated or partially populated values.
Also see Public Properties etc. and its example.
Use one of the Set
methods in a public setter to allow the user to set a
property value that the worker or worker system uses.
Use SetValueUnchecked(T) inside your worker or worker system to set a result value before it has completed.
Use one of the Get
methods in a public getter to allow the user to get
a property value that the worker or worker system has provided.
Use GetValueUnchecked() inside your worker or worker system to get a value that the user has set.
Note that to provide thread-safety, all accesses to the value are synchronized, which have a small overhead. If using the value many times internally in the WorkerParent during the running phase, e.g. for each dataflow row, consider only calling GetValueUnchecked() or SetValueUnchecked(T) once during the running phase, by caching the value in a local variable.
Note: All methods in this class are thread-safe.
Note: If you don't need to check the WorkerParentStatus, consider using Interlocked instead to read and write values, since it has slightly lower overhead.
SchemaMap
Represents a mapping between two members in two dataflow rows. These mappings are available in a collection in the row and column mapper results, e.g. SchemaMaps, and refer to member indexes in the SchemaNodeList collection.
Note that this class is immutable.
SchemaNode
Information about a dataflow row column, or column schema, or unsupported member.
The same SchemaNode
instance appears both as a node in the
SchemaNodeList flat list and the
SchemaNodeRoot hierarchical tree.
The root SchemaNodeRoot
node however does not appear in the SchemaNodeList
flat list.
This class is not instantiated directly, instead use one of the Create(IEnumerable<String>) or overloads.
SortTransform<TInputOutput>
A dataflow worker with one Input
port and one Output
port, that sorts the incoming
rows, optionally removing duplicates, before passing them to the downstream worker.
Note: Use the factory methods in SortTransformFactory to create instances of this class.
SortTransform
uses an in-memory sort. This is a fully blocking transform, i.e. it must receive
and buffer all rows before outputting any rows, which can consume large amounts of memory.
For large datasets that risks exhausting available memory and heavily page to disk,
consider performing the sort in a database, before bringing the data into or back to
the dataflow. Please see
Buffering and Memory Consumption
for more details.
The sort algorithm is unstable, i.e. two rows that compare as equal are not guaranteed to keep their relative ordering.
SortTransformFactory
Factory methods that create a
SortTransform<TInputOutput>
dataflow worker, with one Input
port and one Output
port, which sorts the incoming
rows, optionally removing duplicates, before passing them to the downstream worker.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
SortTransform
uses an in-memory sort. This is a fully blocking transform, i.e. it must receive
and buffer all rows before outputting any rows, which can consume large amounts of memory.
Please see
Buffering and Memory Consumption
for more details.
The sort algorithm is unstable, i.e. two rows that compare as equal are not guaranteed to keep their relative ordering.
SourceBase<TDerived, TOutput>
An abstract dataflow worker with one Output
port, which can be used to create
a dataflow source.
When starting a worker, the library calls RunAsync() during the worker
Running phase. A derived class must override
this abstract method, and call methods on the Output
port to pass data rows
to the downstream worker.
Note that the worker Running
phase also includes additional places where logic can
optionally be inserted via callbacks, to e.g. customize the initialization, cleanup,
and error handling of existing workers. This is mostly used when customizing workers that
are not designed to be derived from (i.e. without a "Base" suffix).
See
Worker Life-cycle for details.
Use SourceBase<TDerived, TOutput, TError> instead if an ErrorOutput
port is needed.
Also see the
SourceBase
example.
SourceBase<TDerived, TOutput, TError>
An abstract dataflow worker with one Output
port and one ErrorOutput
port,
which can be used to create a dataflow source.
When starting a worker, the library calls RunAsync() during the worker
Running phase. A derived class must override
this abstract method, and call methods on the Output
and ErrorOutput
ports
to pass data rows to the downstream worker.
Note that the worker Running
phase also includes additional places where logic can
optionally be inserted via callbacks, to e.g. customize the initialization, cleanup,
and error handling of existing workers. This is mostly used when customizing workers that
are not designed to be derived from (i.e. without a "Base" suffix).
See
Worker Life-cycle for details.
Use SourceBase<TDerived, TOutput> instead if an ErrorOutput
port is not needed.
Also see the
SourceBase
example.
SplitTransform<TInputOutputError>
A dataflow worker with one input, one or more outputs, and one error output port, all of the same type, that sends each incoming row to one of several output ports, or to the error output port, or discards the row, based on a supplied function.
Note that output ports can be added after the worker is created (but before it or its siblings have started) by repeatedly calling Create<TOutput>(String). This can be particularly useful when generating workers in a loop.
SplitTransformFactory
Factory methods that create a SplitTransform<TInputOutputError> dataflow worker, with one input, one or more outputs, and one error output port, all of the same type, that sends each incoming row to one of several output ports, or to the error output port, or discards the row, based on a supplied function.
Note that output ports can be added after the worker is created (but before it or its siblings have started) by repeatedly calling Create<TOutput>(String). This can be particularly useful when generating the downstream workers in a loop.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
StringNumberRow
A type that only contains a string
Row property
and a long
Number property.
This is useful when creating dataflow rows that only need a string and a number,
e.g. when reading the lines of a text file or outputting error rows as strings,
together with a line number.
Also see StringRow, as well as StringRowErrors (which captures error details).
StringRow
A type that only contains a string
Row property.
This is useful when creating dataflow rows that only need a single string,
e.g. when reading the lines of a text file, or outputting error rows as strings.
Also see StringNumberRow, as well as StringRowErrors (which captures error details).
StringRowErrors
An error output row type that contains a string
RowErrorRow field
for providing a string representation of the original dataflow row, as well as
CaptureRowErrors error detail fields.
This is useful when sending the original dataflow row as-is to the error output is not appropriate, e.g. for a source worker where an original dataflow row is not available, and providing a string representation of the external source row is appropriate.
Also see CaptureRowErrors, and Dataflow Row Errors.
SystemOutcomeStatus
Describes a success or failure outcome of a completed worker system.
Note: If a DbException is passed to the static factory methods, database specific exception details will be automatically added to the Message property.
TargetBase<TDerived, TInput>
An abstract dataflow worker with one Input
port, which can be used to create a dataflow target.
When starting a worker, the library calls RunAsync() during the worker
Running phase. A derived class must override
this abstract method, and call methods on the Input
port to consume data rows
from the upstream worker.
Note that the worker Running
phase also includes additional places where logic can
optionally be inserted via callbacks, to e.g. customize the initialization, cleanup,
and error handling of existing workers. This is mostly used when customizing workers that
are not designed to be derived from (i.e. without a "Base" suffix).
See
Worker Life-cycle for details.
Use TargetBase<TDerived, TInput, TError> instead if an ErrorOutput
port is needed.
Also see the
TargetBase
example.
Note: The input port uses the Default policy. Consider whether this is appropriate, or should be changed, see BufferingMode for further details.
TargetBase<TDerived, TInput, TError>
An abstract dataflow worker with one Input
port and one ErrorOutput
port, which can be used to
create a dataflow target.
When starting a worker, the library calls RunAsync() during the worker
Running phase. A derived class must override
this abstract method, and call methods on the Input
port to consume data rows
from the upstream worker.
Note that the worker Running
phase also includes additional places where logic can
optionally be inserted via callbacks, to e.g. customize the initialization, cleanup,
and error handling of existing workers. This is mostly used when customizing workers that
are not designed to be derived from (i.e. without a "Base" suffix).
See
Worker Life-cycle for details.
Use TargetBase<TDerived, TInput> instead if an ErrorOutput
port is not needed.
Also see the
TargetBase
example.
Note: The input port uses the Default policy. Consider whether this is appropriate, or should be changed, see BufferingMode for further details.
ToTypeColumnMapper<TTo>
Defines mappings from a flat list of user supplied or automatically generated source 'from' columns (described by FromRowSchema), to the (potentially hierarchical) fields, properties, table columns etc. of a target 'to' .NET CLR type (described by ToRowSchema).
This facility is commonly used by dataflow workers for copying and renaming column data between external data sources on the one hand, and input fields and properties on the other hand.
Members to be mapped can be specified explicitly via name or index (e.g. copy from
input column "A" to the fourth output column), as well as implicitly based on name (i.e.
AutoName()
). A specified field or property on the 'from' side can reside inside a
column schema, i.e. a struct
that groups multiple fields and properties.
This class does not include the Row() method,
and should be used when the Row()
method is not supported by the worker in question.
Also see ToTypeColumnMapper<TTo> which does include that method.
Mappings can be specified with the columnMapperCommandAction
parameter, or added by calling
the mapping methods Name(String) etc. The mapping result is available in
Mappings, which the caller can use to manually perform any tasks, such as
copy columns.
Also see the Dataflow Column Mapping and Mapping and Copying examples, as well as IRowMapperCommand and IColumnMapperCommand.
Note: If the mapping commands result in duplicate mappings, an exception will be thrown. Resolve this by either of:
- Specify additional or all name parts in explicit mapping
- Change the column (or schema) names in the from and/or to rows to avoid name clashes
- Restrict auto-mapping to specific column schemas
- Map offending columns explicitly by name (which will exclude them from later auto-mappings)
ToTypeColumnMappings
The output from a ToTypeColumnMapper<TTo> row mapper, which maps from a flat list of user supplied or automatically generated source 'from' columns (described by FromRowSchema), to the (potentially hierarchical) fields, properties, table columns etc. of a target 'to' .NET CLR type (described by ToRowSchema).
This class does not include the Row() method,
and should be used when the Row()
method is not supported by the worker in question.
Also see IRowMapperCommand which does include that method.
Also see the Dataflow Column Mapping and Mapping and Copying examples, as well as IRowMapperCommand and IColumnMapperCommand.
ToTypeRowMapper<TTo>
Defines mappings from a flat list of user supplied or automatically generated source 'from' columns (described by FromRowSchema), to the (potentially hierarchical) fields, properties, table columns etc. of a target 'to' .NET CLR type (described by ToRowSchema).
This facility is commonly used by dataflow workers for copying and renaming column data between external data sources on the one hand, and input fields and properties on the other hand.
Members to be mapped can be specified explicitly via name or index (e.g. copy from
input column "A" to the fourth output column), as well as implicitly based on name (i.e.
AutoName()
). A specified field or property on the 'from' side can reside inside a
column schema, i.e. a struct
that groups multiple fields and properties.
This class does include the Row() method, also see FromTypeColumnMapper<TFrom> which excludes that method.
Mappings can be specified with the rowMapperCommandAction
parameter, or added by calling
the mapping methods Name(String) etc. The mapping result is available in
Mappings, which the caller can use to manually perform any tasks, such as
copy columns.
Also see the Dataflow Column Mapping and Mapping and Copying examples, as well as IRowMapperCommand and IColumnMapperCommand.
Note: If the mapping commands result in duplicate mappings, an exception will be thrown. Resolve this by either of:
- Specify additional or all name parts in explicit mapping
- Change the column (or schema) names in the from and/or to rows to avoid name clashes
- Restrict auto-mapping to specific column schemas
- Map offending columns explicitly by name (which will exclude them from later auto-mappings)
ToTypeRowMappings
The output from a ToTypeColumnMapper<TTo> row mapper, which maps from a flat list of user supplied or automatically generated source 'from' columns (described by FromRowSchema), to the (potentially hierarchical) fields, properties, table columns etc. of a target 'to' .NET CLR type (described by ToRowSchema).
Note that it does support the Row() command, and that it is mutually exclusive with commands mapping individual columns. Use ToTypeColumnMappings instead if that is not needed.
Also see the Dataflow Column Mapping and Mapping and Copying examples.
TransformAggregation<TInputAccumulate>
Holds the accumulation values. The seed, accumulation, and output delegates in AggregateTransform<TInputAccumulateOutput> takes this instance as a parameter. Also see the examples in Custom Dataflow Aggregations.
TransformAggregation<TInput, TAccumulate>
Holds the accumulation values. The seed, accumulation, and output delegates in AggregateTransform<TInput, TAccumulateOutput> and AggregateTransform<TInput, TAccumulate, TOutput> takes this instance as a parameter. Also see the examples in Custom Dataflow Aggregations.
TransformAggregationBase<TInput>
Base class that holds accumulation information. The seed, accumulation, and output actions in workers created by AggregateTransformFactory takes this instance as a parameter. Also see the examples in Dataflow Aggregations.
TransformBase<TDerived, TInput, TOutput>
An abstract dataflow worker with one Input
and one Output
port, which can be
used to create a dataflow transform. The library user must inherit this class and override
the RunAsync() method (which will get called once) to add custom functionality.
This method should in turn call methods on the Input
port to consume rows from the
upstream worker, and call methods on the Output
port to send rows to the downstream worker.
Note that the input port uses the Default policy. Consider whether this is appropriate, or should be overwritten, see BufferingMode for further details.
Consider using TwoInputTransformBase<TDerived, TLeftInput, TRightInput, TOutput> instead if two input ports are needed. Also see the TransformBase example.
TrashTarget<TInput>
A dataflow worker with one Input
port, that consumes and discards all incoming rows.
Note: Use the factory methods in TrashTargetFactory to create instances of this class.
TrashTargetFactory
Factory methods that create a TrashTarget<TInput> dataflow worker, which consumes and discards all incoming rows.
The Input
port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
TwoInputTransformBase<TDerived, TLeftInput, TRightInput, TOutput>
An abstract dataflow worker with two input ports and one Output
port, which can be
used to create a dataflow transform. The library user must inherit this class and override the
RunAsync() method (which will get called once) to add custom functionality.
This method should in turn call methods on the LeftInput
and RightInput
ports to consume rows from the upstream workers, and call methods on the Output
port
to send rows to the downstream worker.
Note that the two inputs by default provide Full buffering of incoming data, which potentially can consume a large amount of memory. Consider whether this is appropriate, or should be overridden, see BufferingMode for further details.
Consider using TransformBase<TDerived, TInput, TOutput> instead if two input ports are not needed. Also see the TransformBase example.
TypeColumnCopier<TFrom, TTo>
Creates delegates that copies data from one row (i.e. .NET CLR type) to the other, using high performance generated code. The two rows can optionally be of different types. This facility is commonly used by dataflow workers for copying and renaming column data between input and output ports.
Specify the column mappings with IColumnMapperCommand, which will use a TypeColumnMapper<TFrom, TTo> to define column mappings between two rows.
Alternatively, the column mappings can be specified with a TypeColumnMappings. A TypeRowMappings can also be used, although the IsRowMap property will be ignored.
Members to be copied can be specified explicitly via name or index (e.g. copy from
input column "A" to the fourth output column), as well as implicitly based on name (i.e.
AutoName()
). A specified field or property can also be a schema member, i.e. a
struct
that groups multiple fields and properties, in which case the copy applies to all
contained fields and properties.
Mappings specified with Name()
, AutoName()
etc. are available in
Mappings, which the caller can use to manually perform any tasks, such as
copy columns.
More commonly though, the caller uses CreateCopyAction(Boolean) or CreateDeepCloneFunc() to get a delegate that performs the copy as per the mappings. The caller then calls the delegate for each pair of 'from' and 'to' rows that need copying. This delegate uses high performance generated code, and does not rely on reflection when performing the copies.
Also see the Dataflow Column Mapping and Mapping and Copying examples, as well as IColumnMapperCommand.
If only mapping is needed (but no copy delegates), use TypeRowMapper<TFrom, TTo> instead. Also see FromTypeColumnMapper<TFrom> and ToTypeColumnMapper<TTo>, which provide mappings when only one .NET CLR type is known.
Note: If the mapping commands result in duplicate mappings, an exception will be thrown. Resolve this by either of:
- Specify additional or all name parts in explicit mapping
- Change the column (or schema) names in the from and/or to rows to avoid name clashes
- Restrict auto-mapping to specific column schemas
- Map offending columns explicitly by name (which will exclude them from later auto-mappings)
TypeColumnMapper<TFrom, TTo>
Defines mappings of the (potentially hierarchical) fields, properties, table columns etc. of a source 'from' (described by FromRowSchema) and target 'to' (described by ToRowSchema) .NET CLR types, which can be of the same or different types.
This facility is commonly used by dataflow workers for copying and renaming column data from input port rows to output port rows.
Members to be mapped can be specified explicitly via name or index (e.g. copy from
input column "A" to the fourth output column), as well as implicitly based on name (i.e.
AutoName()
). A specified field or property on the 'from' side can reside inside a
column schema, i.e. a struct
that groups multiple fields and properties.
This class does not include the Row() method,
and should be used when the Row()
method is not supported by the worker in question.
Also see IRowMapperCommand which does include that method.
Mappings can be specified with the columnMapperCommandAction
parameter, or added by calling
the mapping methods Name(String) etc. The mapping result is available in
Mappings, which the caller can use to manually perform any tasks, such as
copy columns.
Also see the Dataflow Column Mapping and Mapping and Copying examples, as well as IRowMapperCommand and IColumnMapperCommand.
Note: If the mapping commands result in duplicate mappings, an exception will be thrown. Resolve this by either of:
- Specify additional or all name parts in explicit mapping
- Change the column (or schema) names in the from and/or to rows to avoid name clashes
- Restrict auto-mapping to specific column schemas
- Map offending columns explicitly by name (which will exclude them from later auto-mappings)
TypeColumnMappings
The output from a TypeColumnMapper<TFrom, TTo> row mapper, which maps the (potentially hierarchical) fields, properties, table columns etc. of a source 'from' (described by FromRowSchema) and target 'to' (described by ToRowSchema) .NET CLR types, which can be of the same or different types.
This class does not include the Row() method,
and should be used when the Row()
method is not supported by the worker in question.
Also see IRowMapperCommand which does include that method.
Also see the Dataflow Column Mapping and Mapping and Copying examples, as well as IRowMapperCommand and IColumnMapperCommand.
TypeExtensions
Static extension helper methods on Type.
TypeRowMapper<TFrom, TTo>
Defines mappings of the (potentially hierarchical) fields, properties, table columns etc. of a source 'from' (described by FromRowSchema) and target 'to' (described by ToRowSchema) .NET CLR types, which can be of the same or different types.
This facility is commonly used by dataflow workers for copying and renaming column data from input port rows to output port rows.
Members to be mapped can be specified explicitly via name or index (e.g. copy from
input column "A" to the fourth output column), as well as implicitly based on name (i.e.
AutoName()
). A specified field or property on the 'from' side can reside inside a
column schema, i.e. a struct
that groups multiple fields and properties.
This class does include the Row() method, also see IColumnMapperCommand which excludes that method.
Mappings can be specified with the rowMapperCommandAction
parameter, or added by calling
the mapping methods Name(String) etc. The mapping result is available in
Mappings, which the caller can use to manually perform any tasks, such as
copy columns.
Also see the Dataflow Column Mapping and Mapping and Copying examples, as well as IRowMapperCommand and IColumnMapperCommand.
Note: If the mapping commands result in duplicate mappings, an exception will be thrown. Resolve this by either of:
- Specify additional or all name parts in explicit mapping
- Change the column (or schema) names in the from and/or to rows to avoid name clashes
- Restrict auto-mapping to specific column schemas
- Map offending columns explicitly by name (which will exclude them from later auto-mappings)
TypeRowMappings
The output from a TypeRowMapper<TFrom, TTo> row mapper, which maps the (potentially hierarchical) fields, properties, table columns etc. of a source 'from' (described by FromRowSchema) and target 'to' (described by ToRowSchema) .NET CLR types, which can be of the same or different types.
Note that it does support the Row() command, and that it is mutually exclusive with commands mapping individual columns. Use TypeColumnMappings instead if that is not needed.
Also see the Dataflow Column Mapping and Mapping and Copying examples, as well as IRowMapperCommand and IColumnMapperCommand.
TypeRowSchema
Provides information about the columns in a dataflow row (i.e. a .NET CLR type) and their data types. CreateSetValueAction<T>() can also create an action that sets a column value in a dataflow row using high performance generated code, based on an index.
The class is used for working with column names and their types, e.g. for implementing column mapping directly, or for building higher level constructs such as TypeColumnCopier<TFrom, TTo>.
Use TypeRowSchema
when you have access to the required .NET CLR type, since it can populate the
instance automatically from the type. Otherwise use FlatRowSchema and specify columns
explicitly.
Columns can be searched for via several Get*
methods.
They can also be accessed as a flat list via the
SchemaNodeList property, and as a
hierarchical tree via the SchemaNodeRoot property,
both of which return items in a well defined order controlled by OrderAttribute.
Note that the root node of the tree is not included in the SchemaNodeList
list.
The hierarchical tree and flat list includes all descendant fields and properties, down to either
an intrinsically supported column type, e.g. int
, byte[]
, SqlBinary
etc.,
or an unsupported type. Each column schema (i.e. a struct
that is not a
supported column type) creates a new parent node in the tree.
TypeSchemaCopyPolicy
Information about a dataflow field or property type, e.g. if it is supported in dataflows and how to copy the field or property. Used by TypeSchemaNode.
Also see the Dataflow Column Mapping and Mapping and Copying examples, as well as IRowMapperCommand.
Note that instances of this class are immutable.
TypeSchemaNode
Information about a dataflow row column, or column schema, or unsupported member.
The same TypeSchemaNode
instance appears both as a node in the
SchemaNodeRoot
hierarchical tree (if populated at all), and in its
SchemaNodeList flat list.
Note that column and schema members must be public
.
Use one of the Create(Type) overloads to instantiate this class.
UnionAllTransform<TInputOutput>
A dataflow worker with multiple input ports and one Output
port, all of the same type,
that sends all incoming rows to the single output. Any duplicates are preserved.
Note: Use the factory methods in UnionAllTransformFactory to create instances of this class.
Ports will automatically get a Limited buffering mode, unless the user explicitly sets it to Full.
Note that input ports can be added after the worker is created (but before it or its siblings have started) by repeatedly calling Create<TInput>(String, OutputPortBase<TInput>). This can be particularly useful when generating workers in a loop.
Rows are consumed in a round-robin order among the inputs that have rows available, taking up to BufferCapacity rows at a time.
Note: To instead concatenate the input rows, i.e. to exhaust each input fully before forwarding any rows from other inputs, in a predefined order, use MergeSortedTransform<TInputOutput> and ensure that the sort key range for each input does not overlap with the sort key range for any other input, and that the sort key values matches your desired input port row order. One simple option is to add a constant integer property to each input row type, and use that as the sort key.
UnionAllTransformFactory
Factory methods that create a
UnionAllTransform<TInputOutput>
dataflow worker, with multiple input ports and one Output
port, all of the same type,
that sends all incoming rows to the single output. Any duplicates are preserved.
Ports will automatically get a Limited buffering mode, unless the user explicitly sets it to Full.
Note that input ports can be added after the worker is created (but before it or its siblings have started) by repeatedly calling Create<TInput>(String, OutputPortBase<TInput>). This can be particularly useful when generating workers in a loop.
Rows are consumed in a round-robin order among the inputs that have rows available, taking up to BufferCapacity rows at a time.
The "first" input port (i.e. TypedInputs[0]
) is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
Note: To instead concatenate the input rows, i.e. to exhaust each input fully before forwarding any rows from other inputs, in a predefined order, use MergeSortedTransform<TInputOutput> and ensure that the sort key range for each input does not overlap with the sort key range for any other input, and that the sort key values matches your desired input port row order. One simple option is to add a constant integer property to each input row type, and use that as the sort key.
UsingActionWorker<TDisposable>
A worker which mimics the using
statement in C#/VB, i.e. it allows creating an arbitrary
disposable object, running a callback with access to the disposable object, and then
automatically disposing the object (via an
AddCompletedCallback(Func<WorkerBase, OutcomeStatus, Task<OutcomeStatus>>)
callback).
This construct is very useful for guaranteeing a disposable object always gets correctly disposed, even if exceptions occur.
An alternative is to use DisposeOnFinished<TDisposable>(TDisposable). Also see Disposing Disposables.
VersionInfo
Build, release, and assembly versions and configuration details.
WhileActionWorker<T>
A worker that executes a callback once for each iteration in a while-loop. The callback can optionally create child workers, which will be removed after each iteration, except on the last iteration, allowing them to be inspected.
Note that workers should not be added by the evaluateFunc
callback,
since they will be removed before they are run. Instead use the iterateAction
,
iterateActionAsync
, iterateFunc
, or iterateFuncAsync
parameters
to add child workers. Less commonly, child workers can also be added (typically by the
worker parent) before the worker runs, in which case they will run on the
first iteration (if any), but not on later iterations.
Any failure status returned from the user callback or escalated from worker children
will halt the iteration. One can suppress Error
escalation on selected child workers
(by setting EscalateError to false), e.g. to implement retry-on-error
functionality.
Note that if a dedicated parent worker is not needed (e.g. to group child workers), and all
child workers can be created before any child worker is started,
it is simpler to instead use a regular while
statement to perform the work,
e.g. creating child workers. See
Instantiating Workers via Looping
for details.
Worker
A worker which only has the intrinsic functionality inherited from WorkerBase<TDerived>, such as Start constraints and Grouping children.
If it runs, this worker will always complete with a Succeeded status. Use ActionWorker instead to complete with a failure or calculated status.
Common use cases:
Start constraints | Useful for concentrating multiple start constraints in one worker, e.g. to mark the end of one stage in the processing. Later stages can then reference this worker in their start constraints. |
Grouping children | Used when adding children to a worker from either another worker, or from outside the worker system (typically before starting the worker system). |
This class is sealed. WorkerBase is available to derive from.
WorkerBase
The main unit of execution. Specify this class when referring to workers without knowing the final derived worker type. All workers inherit from this class, indirectly via WorkerBase<TDerived>. Use this latter class when you do know the final derived worker type.
When starting a worker, the library calls RunAsync() during the worker Running phase. A derived class must override this abstract method, which normally contains the bulk of the worker logic.
Note that the worker Running
phase also includes additional places where logic can
optionally be inserted via callbacks, to e.g. customize the initialization, cleanup,
and error handling of existing workers. This is mostly used when customizing workers that
are not designed to be derived from (i.e. without a "Base" suffix).
See
Worker Life-cycle for details.
See Workers for how to use workers, and see Custom Workers for how to develop your own workers.
WorkerBase<TDerived>
This class contains worker members that uses the type of the final derived
worker (TDerived
).
All workers inherit from this class, directly or
indirectly. Derive from this class (or another more derived worker) when creating
a new custom worker.
To refer to workers where you don't know the final derived worker type, instead use WorkerBase.
When starting a worker, the library calls RunAsync() during the worker Running phase. A derived class must override this abstract method, which normally contains the bulk of the worker logic.
Note that the worker Running
phase also includes additional places where logic can
optionally be inserted via callbacks, to e.g. customize the initialization, cleanup,
and error handling of existing workers. This is mostly used when customizing workers that
are not designed to be derived from (i.e. without a "Base" suffix).
See
Worker Life-cycle for details.
See Workers for how to use workers, and see Custom Workers for how to develop your own workers.
WorkerParent
A base class for WorkerSystem and WorkerBase with shared functionality for managing child workers, RunAsync(), etc.
WorkerParentStateExtensions
Extension methods for WorkerParentState.
WorkerParentStatus
Describes the execution status of a WorkerParent
.
Note: If a DbException is passed to the static factory methods, database specific exception details will be automatically added to the Message property.
WorkerSystem
A class that creates and runs a system of workers. Typical use involves adding user logic with a Root(Action<WorkerSystem>) overload, and start the worker system with StartAsync() or Start().
To create a custom reusable worker system type, either inherit from this class (if retaining the Root(Action<WorkerSystem>) overloads), or from WorkerSystemBase<TDerived>.
WorkerSystemBase
An abstract class that creates and runs a system of workers. Use it to refer to worker systems when you don't know the exact derived type (e.g. WorkerSystem. Commonly used members include StartAsync(), Start(), and Config.
This class cannot be inherited directly. To create a custom worker system, derive from WorkerSystemBase<TDerived> and override RunAsync() to provide a custom implementation.
WorkerSystemBase<TDerived>
An abstract class that creates and runs a system of workers. Commonly used members include SetValue(String, Int64) configuration overloads and methods for adding worker system callbacks.
To create a custom reusable worker system type, inherit from this class and also override RunAsync() to provide your user logic, i.e. what you would otherwise use a Root(Action<WorkerSystem>) overload to provide.
To retain the Root(Action<WorkerSystem>) overloads, instead inherit from WorkerSystem.
Structs
CaptureRowErrors
A struct
that, if present in a dataflow row as the field
public CaptureRowErrors CaptureRowErrors;
, automatically adds a row error each time
the row passes through an error output port.
These errors are stored in the RowErrors property, which allows inspecting and optionally removing them using a downstream worker. A single row can have multiple errors if the row has passed through the error ports of multiple workers.
The other members present only the last error (if any), making it easy to inspect the last error, or insert it into a database table.
Note that all members, including the errors list, will be null
if there are
no errors, or if a particular error detail is not available.
Also see Dataflow Row Errors.
DownstreamFactory<TOutput>
A factory that creates dataflow transforms or targets, and links their "first" input port to (if available) the output or error output port specified by the factory. See Worker Instantiation and Transform and Target Factory Methods for details.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
Note that each factory is immutable, and also does not contain any methods for creating
workers. Instead, the developer of a new transform or target worker creates
extension methods
(targeting this struct
), which create and link the new worker.
OutcomeStatusResult<TResult>
An immutable struct
containing a Status
(OutcomeStatus) property and a generic Result property.
This is typically used (to avoid out
parameters) when returning an
OutcomeStatus plus one more result from a method.
The struct
can also be wrapped in a Task
when returning from asynchronous methods. This is e.g. used by
DeleteRowsAsync(String).
Note that you must check that Status
is Succeeded
before using the
Result property.
Also note that the struct
can be
deconstructed
for easy access to the members.
ProgressStatusResult<TResult>
An immutable struct
containing a Status
(ProgressStatus) property and a generic Result property.
This is typically used (to avoid out
parameters) when returning a
ProgressStatus plus one more result from a method.
This is e.g. used by
OnOutputRowDemand().
This struct
can also be wrapped in a Task
when returning from asynchronous methods.
Note that you must ensure Status
is NotCompleted
or Succeeded
(IsFailed
is false
) before using the
Result property.
Also note that the struct
can be
deconstructed
for easy access to the members.
Interfaces
IAggregationCommand
Commands for specifying predefined column aggregation functions (Average
, Count
,
CountDistinct
, CountRows
, First
, Last
, Max
, Min
and Sum
) and columns for use with several AggregateTransformFactory members.
Also see the examples in Dataflow Aggregations.
IColumnMapperCommand
This interface contains commands for mapping columns from a source row to a target row. The source and target can be different from each other, and might be tables, queries etc. in external data sources, or fields and properties in .NET CLR types. It is often used as a worker parameter or property, to allow the user to specify which columns to map or copy. The worker then uses TypeColumnMapper<TFrom, TTo>, FromTypeColumnMapper<TFrom>, or ToTypeColumnMapper<TTo> to create mappings and perform the copying etc. itself. Also see TypeColumnCopier<TFrom, TTo>, which allows generating high performance copying delegates.
This interface does not include the Row() method,
and should be used when the Row()
method is not supported by the worker in question.
Also see IRowMapperCommand which does include that method.
See Dataflow Column Mapping and Mapping and Copying for more details.
Note: If the mapping commands result in duplicate mappings, an exception will be thrown. Resolve this by either of:
- Specify additional or all name parts in explicit mapping
- Change the column (or schema) names in the from and/or to rows to avoid name clashes
- Restrict auto-mapping to specific column schemas
- Map offending columns explicitly by name (which will exclude them from later auto-mappings)
IComparer<TLeft, TRight>
Defines a method that compares two objects that can be of different types. It is otherwise similar to IComparer<T>, except that there is no default comparer.
IDisposeOnFinished
Accepts disposable objects, which will be automatically disposed when the implementer finishes, or will be disposed immediately if the implementer has already finished.
See DisposeOnFinished<TDisposable>(TDisposable) for the exact definition of 'finished'. Also see Disposing Disposables.
IFluentInterface
This interface can be inherited by other fluent interfaces to stop methods
declared by Object (ToString()
, Equals()
etc.) from
showing up in Visual Studio IntelliSense®. It does not add any new functionality.
It is used by e.g. IRowComparerCommand<TLeft, TRight> and IXlsxSourceCommand, and can optionally be used by any fluent API that the library user develops.
Code that consumes implementations of this interface should expect one of two things:
- When referencing the interface from within the same solution (a project reference), you will still see the methods this interface is meant to hide.
- When referencing the interface through the compiled output assembly (an external reference, or a NuGet package reference), the standard
Object
methods will be hidden as intended.
IGroupByCommand
Commands for specifying grouping columns for use with several AggregateTransformFactory overloads. Unlike IGroupByCopyCommand, this interface is not used when the worker must explicitly copy individual grouping columns to the output rows. Also see the examples in Dataflow Aggregations.
IGroupByCopyCommand
Commands for specifying grouping columns for use with several AggregateTransformFactory overloads. Unlike IGroupByCommand, this interface is used when the worker must explicitly copy individual grouping columns to the output rows. Also see the examples in Dataflow Aggregations.
IOutcomeStatus
Describes the success or failure outcome of a process, e.g. of a completed worker or port.
IRowComparerCommand<T>
Commands for specifying which columns to include and how to compare two rows of the same type.
Also see RowComparer<T> and Compare Dataflow Columns.
Use IRowComparerCommand<TLeft, TRight> instead if either the rows have different types, or they are the same type and you also need to compare different columns (e.g. column "A" in one row with column "B" in the other row, even though the rows are the same type).
IRowComparerCommand<TLeft, TRight>
Commands for specifying which columns to include and how to compare two rows of potentially different types.
Also see RowComparer<TLeft, TRight> and Compare Dataflow Columns. Use IRowComparerCommand<T> if the rows have the same type.
IRowMapperCommand
This interface contains commands for mapping columns from a source row to a target row. The source and target can be different from each other, and might be tables, queries etc. in external data sources, or fields and properties in .NET CLR types. It is often used as a worker parameter or property, to allow the user to specify which columns to map or copy. The worker then uses e.g. TypeColumnCopier<TFrom, TTo> to generate a high performance copying delegate, or uses TypeRowMapper<TFrom, TTo>, FromTypeRowMapper<TFrom>, or ToTypeRowMapper<TTo> to create mappings and perform the copying etc. itself.
This interface does include the Row() method, also see IColumnMapperCommand which excludes that method.
See Dataflow Column Mapping and Mapping and Copying for more details.
Note: If the mapping commands result in duplicate mappings, an exception will be thrown. Resolve this by either of:
- Specify additional or all name parts in explicit mapping
- Change the column (or schema) names in the from and/or to rows to avoid name clashes
- Restrict auto-mapping to specific column schemas
- Map offending columns explicitly by name (which will exclude them from later auto-mappings)
Enums
DebugPortCommands
Commands for when to launch and/or break a debugger when debugging dataflow ports.
Multiple commands can be set at the same time by combining them with bitwise OR
(|
in C#).
The various "On..." commands defines when to break or launch the debugger, while Launch and Break defines what to do, i.e. launching and breaking the debugger. In most debugging scenarios, both one or more "On..." commands, as well as either Launch or Break, should be set.
There are also several predefined combinations, e.g. BreakOnRowsAndStateCompleted that is very useful for viewing rows sent from an output or error output port.
Note that to view the dataflow rows while debugging, the DebugCommands property must be set to something other than None when the port worker initially runs.
Also note that any Launch command will be automatically downgraded to a Break command after the first attempt to launch a debugger.
DebugWorkerParentCommands
Commands for when to launch or break a debugger when debugging workers
and the worker system (i.e. WorkerParent instances).
Multiple commands can be set at the same time by combining them with bitwise OR
(|
in C#).
The various "On..." commands defines when to break or launch the debugger, while Launch and Break defines what to do, i.e. launching and breaking the debugger. In most debugging scenarios, both one or more "On..." commands, as well as either Launch or Break, should be set.
There are also several predefined combinations, e.g.
BreakOnCompleted that is very useful for viewing
WorkerParent
s after they complete.
Also note that any Launch command will be automatically
downgraded to a
Break command after the first attempt to launch a debugger
for this WorkerParent
.
DictionaryAddKeyTreatment
Specifies how to handle any duplicate keys when adding items to a dictionary.
The default is FatalOnDuplicate.
DictionaryLookupRowTreatment
Used by the dictionary lookup transforms to control the action taken for each row passing through the transform.
InputPortState
The different states an input port can be in. When created, ports are in the Created
state, and can only ever move to a state with a higher numerical value.
The port cannot change state after reaching a completed state, i.e. Succeeded
,
Error
, or Fatal
.
OutcomeState
The different success or failure outcome states of a process, e.g. of a completed worker or port.
OutputPortBaseState
The different states an output (or error output) port can be in. When created, ports are
in the Created
state, and can only ever move to a state with a higher
numerical value. The port cannot change state after reaching a completed state,
i.e. Succeeded
, Error
, or Fatal
.
PortBufferingMode
The row buffering strategy an input port uses, i.e. whether the input port only buffers a smaller amount of rows to minimize the time the worker is stalled for input, or allows full buffering of as much as all rows. The latter is most commonly used by input ports on workers that could otherwise dead-lock (typically having multiple inter-dependent inputs, e.g. a worker like MergeSortedTransform<TInputOutput>).
Note that error output ports will always have Full
buffering,
since this simplifies outputting error rows. To avoid running out of memory in cases
where a very large number of error rows might be produced, limit the number of
allowed error rows (which is also the default) using
MaxRowsSent
and/or MaxRowsBeforeError.
ProgressState
Describes an execution state, e.g. of a callback function of a worker.
RowAggregationFunction
Specifies which row-based aggregation function to use, such as First, Single etc., with or without grouping. Also see the examples in Dataflow Aggregations.
Without grouping, the function applies to all incoming rows: AggregateTransform1<TInputAccumulateOutput>(in DownstreamFactory<TInputAccumulateOutput>, String, RowAggregationFunction).
With grouping, the function applies to all rows in each grouping: AggregateTransform1<TInputAccumulateOutput>(in DownstreamFactory<TInputAccumulateOutput>, String, RowAggregationFunction, Action<IGroupByCommand>) and AggregateTransform1<TInputAccumulateOutput>(in DownstreamFactory<TInputAccumulateOutput>, String, RowAggregationFunction, IEqualityComparer<TInputAccumulateOutput>).
All these row-based aggregation functions operate without accessing individual columns in the dataflow rows. Use other AggregateTransformFactory overloads to calculate column aggregations (Sum, Average etc.), or any custom aggregation functions.
SchemaNodeCategory
The type of SchemaNode instance.
TransformRowTreatment
Used by some dataflow transform workers to specify how a row should be handled, e.g. passed to the downstream worker, or discarded.
TypeSchemaCopyOption
Specifies how a column of a particular type is copied from one dataflow row to another. Used by TypeSchemaCopyPolicy.
WorkerParentChildrenState
Tracks whether any child workers have been added to a parent, and if so, completed.
WorkerParentState
Describes the execution state of a WorkerParent
, i.e. a worker or a WorkerSystem
.
See
Worker Life-cycle for details.
Delegates
Comparison<TLeft, TRight>
A comparison delegate function that compares two objects that can be of different types. It is otherwise similar to Comparison<T>.
DeepCopyExpressionCallback
A delegate for creating an expression tree expression that performs a deep-copy from one field or property to another field or property.
The library user can use this delegate via the DeepCopyExpressionCallback property, but would normally only use it indirectly via TypeColumnCopier<TFrom, TTo>.