Search Results for

    Show / Hide Table of Contents

    Class AggregateTransformFactory

    Factory methods that create aggregation and grouping dataflow workers, which aggregate and (optionally) group incoming rows, and output at most one row for all input rows or per unique grouping. Also see the examples in Dataflow Aggregations.

    Aggregations can be specified in the following ways:

    • Predefined column aggregations Average, Count, CountDistinct, CountRows, First, Last, Max, Min and Sum. Also see IAggregationCommand for details.
    • Predefined row aggregations such as First, Last, Single etc. Also see RowAggregationFunction for details.
    • Custom seed, accumulation and output callbacks to implement custom aggregations not provided by the predefined aggregation functions.

    Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.

    The incoming rows can optionally also be grouped (i.e. perform a GROUP BY), by either specifying which columns to group by, or by providing either a grouping key function, or a row equality comparer.

    Also note that:

    • AggregateTransform1() overloads create AggregateTransform<TInputAccumulateOutput> workers, where input, accumulation and output types are the same.
    • AggregateTransform2() overloads create AggregateTransform<TInput, TAccumulateOutput> workers, where the input type can be different from the accumulation and output types.
    • AggregateTransform3() overloads create AggregateTransform<TInput, TAccumulate, TOutput> workers, where the input, accumulation and output types can all be different.

    These workers are fully blocking, i.e. they will only output the row(s) after they have received all incoming rows. They will buffer (and therefore consume memory for) only a single accumulation (without grouping), or multiple accumulations corresponding to the number of unique groupings.

    Note that by default on .NET Framework, maximum array size is 2GB, which in turn with a 64-bit application limits the number of unique groupings to a maximum of 47.9 million, and the CountDistinct(String) number of distinct values per column to a maximum of just over 89 million. You can remove these limits by enabling support for larger arrays, as described in <gcAllowVeryLargeObjects> Element.

    .NET 6+ on the other hand supports >2GB arrays by default.

    The Input port is linked to (if available) the upstream output or error output port specified by the factory.

    Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).

    Inheritance
    Object
    AggregateTransformFactory
    Namespace: actionETL
    Assembly: actionETL.dll
    Syntax
    public static class AggregateTransformFactory

    Methods

    AggregateTransform1<TInputAccumulateOutput>(in DownstreamFactory<TInputAccumulateOutput>, String, RowAggregationFunction)

    Initializes a new instance of the AggregateTransform<TInputAccumulateOutput> dataflow worker, which aggregates incoming rows, and outputs at most one row. The input rows, the accumulation, and the output row all have the same type.

    The aggregation is one of the predefined RowAggregationFunctions available that operate on whole rows, such as First, Single etc. Use other overloads to aggregate individual columns.

    The Input port is linked to (if available) the upstream output or error output port specified by the factory.

    Declaration
    public static AggregateTransform<TInputAccumulateOutput> AggregateTransform1<TInputAccumulateOutput>(this in DownstreamFactory<TInputAccumulateOutput> downstreamFactory, string workerName, RowAggregationFunction rowAggregationFunction)
        where TInputAccumulateOutput : class
    Parameters
    Type Name Description
    DownstreamFactory<TInputAccumulateOutput> downstreamFactory

    The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to.

    Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).

    String workerName

    Name of the worker.

    Set to a prefix plus a trailing "/" (e.g. "MyPrefix-/") to generate a unique name from the prefix plus an increasing number starting at 1.

    While less useful, set to null, whitespace or "/" to generate a unique name from the worker type plus an increasing number starting at 1.

    The name cannot otherwise contain "/", and cannot start with double underscore "__".

    RowAggregationFunction rowAggregationFunction

    The row aggregation function, such as First, Single etc.

    Returns
    Type Description
    AggregateTransform<TInputAccumulateOutput>

    The newly created and (optionally) linked worker.

    Type Parameters
    Name Description
    TInputAccumulateOutput

    The type of each Input, accumulation, and Output row.

    Exceptions
    Type Condition
    ArgumentException

    workerName:

    • Workers with the same parent must have unique names.
    • Worker and worker system names cannot contain '/' or start with double underscore '__'.
    ArgumentNullException

    workerParent - All workers must have a parent. The top level workers have the worker system as parent.

    InvalidOperationException
    • Cannot add child worker to parent which has completed. Are you adding it to the correct parent?
    • Cannot add worker to parent, since its children have been started. Are you adding it to the correct parent?

    AggregateTransform1<TInputAccumulateOutput>(in DownstreamFactory<TInputAccumulateOutput>, String, RowAggregationFunction, Action<IGroupByCommand>)

    Initializes a new instance of the AggregateTransform<TInputAccumulateOutput> dataflow worker, which groups incoming rows using an IGroupByCommand, aggregates each group, and outputs at most one row per unique grouping. The input rows, the accumulation, and the output row all have the same type.

    The aggregation is one of the predefined RowAggregationFunctions available that operate on whole rows, such as First, Single etc. Use other overloads to aggregate individual columns.

    The Input port is linked to (if available) the upstream output or error output port specified by the factory.

    Declaration
    public static AggregateTransform<TInputAccumulateOutput> AggregateTransform1<TInputAccumulateOutput>(this in DownstreamFactory<TInputAccumulateOutput> downstreamFactory, string workerName, RowAggregationFunction rowAggregationFunction, Action<IGroupByCommand> groupByCommandAction)
        where TInputAccumulateOutput : class
    Parameters
    Type Name Description
    DownstreamFactory<TInputAccumulateOutput> downstreamFactory

    The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to.

    Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).

    String workerName

    Name of the worker.

    Set to a prefix plus a trailing "/" (e.g. "MyPrefix-/") to generate a unique name from the prefix plus an increasing number starting at 1.

    While less useful, set to null, whitespace or "/" to generate a unique name from the worker type plus an increasing number starting at 1.

    The name cannot otherwise contain "/", and cannot start with double underscore "__".

    RowAggregationFunction rowAggregationFunction

    The row aggregation function, such as First, Single etc.

    Action<IGroupByCommand> groupByCommandAction

    Commands to specify grouping columns in the incoming rows. E.g. to group by Category and Year:

    g => g.Name(nameof(MyClass.Category))
          .Name(nameof(MyClass.Year))

    Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match.

    Set to null to not group rows.

    Returns
    Type Description
    AggregateTransform<TInputAccumulateOutput>

    The newly created and (optionally) linked worker.

    Type Parameters
    Name Description
    TInputAccumulateOutput

    The type of each Input, accumulation, and Output row.

    Exceptions
    Type Condition
    ArgumentException

    workerName:

    • Workers with the same parent must have unique names.
    • Worker and worker system names cannot contain '/' or start with double underscore '__'.
    • If groupByCommandAction is non-null, must specify at least one group by column
    • Specified name is not a column.
    ArgumentNullException
    • workerParent - All workers must have a parent. The top level workers have the worker system as parent.
    • groupByCommandAction
    • columnName
    InvalidOperationException
    • Cannot add child worker to parent which has completed. Are you adding it to the correct parent?
    • Cannot add worker to parent, since its children have been started. Are you adding it to the correct parent?
    • Found more than one member match in type.

    AggregateTransform1<TInputAccumulateOutput>(in DownstreamFactory<TInputAccumulateOutput>, String, RowAggregationFunction, IEqualityComparer<TInputAccumulateOutput>)

    Initializes a new instance of the AggregateTransform<TInputAccumulateOutput> dataflow worker, which groups incoming rows using an IEqualityComparer<T>, aggregates each group, and outputs at most one row per unique grouping. The input rows, the accumulation, and the output row all have the same type.

    The aggregation is one of the predefined RowAggregationFunctions available that operate on whole rows, such as First, Single etc. Use other overloads to aggregate individual columns.

    The Input port is linked to (if available) the upstream output or error output port specified by the factory.

    Declaration
    public static AggregateTransform<TInputAccumulateOutput> AggregateTransform1<TInputAccumulateOutput>(this in DownstreamFactory<TInputAccumulateOutput> downstreamFactory, string workerName, RowAggregationFunction rowAggregationFunction, IEqualityComparer<TInputAccumulateOutput> groupByEqualityComparer)
        where TInputAccumulateOutput : class
    Parameters
    Type Name Description
    DownstreamFactory<TInputAccumulateOutput> downstreamFactory

    The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to.

    Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).

    String workerName

    Name of the worker.

    Set to a prefix plus a trailing "/" (e.g. "MyPrefix-/") to generate a unique name from the prefix plus an increasing number starting at 1.

    While less useful, set to null, whitespace or "/" to generate a unique name from the worker type plus an increasing number starting at 1.

    The name cannot otherwise contain "/", and cannot start with double underscore "__".

    RowAggregationFunction rowAggregationFunction

    The row aggregation function, such as First, Single etc.

    IEqualityComparer<TInputAccumulateOutput> groupByEqualityComparer

    An instance that compares incoming rows. Rows that compare equal will be in the same grouping. Typically Create<T>(Action<IGroupByCommand>) is used to create the comparer, but Create<T, TKey>(Func<T, TKey>) or a custom one can also be used.

    Set to null to not group rows.

    Returns
    Type Description
    AggregateTransform<TInputAccumulateOutput>

    The newly created and (optionally) linked worker.

    Type Parameters
    Name Description
    TInputAccumulateOutput

    The type of each Input, accumulation, and Output row.

    Exceptions
    Type Condition
    ArgumentException

    workerName:

    • Workers with the same parent must have unique names.
    • Worker and worker system names cannot contain '/' or start with double underscore '__'.
    ArgumentNullException

    workerParent - All workers must have a parent. The top level workers have the worker system as parent.

    InvalidOperationException
    • Cannot add child worker to parent which has completed. Are you adding it to the correct parent?
    • Cannot add worker to parent, since its children have been started. Are you adding it to the correct parent?

    AggregateTransform1<TInputOutput>(in DownstreamFactory<TInputOutput>, String, Action<IAggregationCommand>)

    Initializes a new instance of the AggregateTransform<TInputAccumulateOutput> dataflow worker, which aggregates incoming rows, and outputs exactly one row with one or more columns populated by aggregation functions Average, Count, CountDistinct, CountRows, First, Last, Max, Min and Sum, see IAggregationCommand for details. Also see AggregateTransform1<TInputOutput>(in DownstreamFactory<TInputOutput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>) which includes grouping, as well as the examples in Dataflow Aggregations.

    The input and output rows have the same type.

    The Input port is linked to (if available) the upstream output or error output port specified by the factory.

    Declaration
    public static AggregateTransform<TInputOutput> AggregateTransform1<TInputOutput>(this in DownstreamFactory<TInputOutput> downstreamFactory, string workerName, Action<IAggregationCommand> aggregationCommandAction)
        where TInputOutput : class, new()
    Parameters
    Type Name Description
    DownstreamFactory<TInputOutput> downstreamFactory

    The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to.

    Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).

    String workerName

    Name of the worker.

    Set to a prefix plus a trailing "/" (e.g. "MyPrefix-/") to generate a unique name from the prefix plus an increasing number starting at 1.

    While less useful, set to null, whitespace or "/" to generate a unique name from the worker type plus an increasing number starting at 1.

    The name cannot otherwise contain "/", and cannot start with double underscore "__".

    Action<IAggregationCommand> aggregationCommandAction

    Commands for specifying predefined aggregation functions (Average, Count, CountDistinct, CountRows, First, Last, Max, Min and Sum) as well as input and output column names. Specify a single column name to use it for both input and output. Example use:

    ac => ac
        .Max(nameof(MyClass.Price))
        .Average(nameof(MyClass.Price)
               , nameof(MyClass.AveragePrice))

    See IAggregationCommand for details.

    Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match.

    Returns
    Type Description
    AggregateTransform<TInputOutput>

    The newly created and (optionally) linked worker.

    Type Parameters
    Name Description
    TInputOutput

    The type of the input and output rows. Must be a concrete class type with a public parameterless constructor.

    Remarks

    Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.

    Also consider whether to use non-nullable (e.g. int) or nullable (e.g. int?) .Net types, or the corresponding database provider type if available (e.g. SqlInt32).

    The aggregation functions follow the SQL-92 standard:

    • Aggregation without grouping will always produce one output row, including for an empty input set
    • On an empty input set, Count aggregations return 0, whereas the other aggregations return null
    • With grouping, an empty input set will not produce any output rows
    • Count aggregations support nullable input column types. Non-Count aggregations support nullable input column value types if both input and output columns are nullable.
    • null input column values are not included in aggregations (CountRows however doesn't look at input column values at all)
    • If the aggregation result type (usually the same as the input column type), can be implicitly cast to the output column type, then they can have different types, e.g. long to double
    • Sum and Average additions are checked for overflow, Count additions are not

    Exceptions
    Type Condition
    ArgumentException
    • aggregationCommandAction must be non-null
    • aggregationCommandAction - must specify at least one aggregation function
    ArgumentException

    workerName:

    • Workers with the same parent must have unique names.
    • Worker and worker system names cannot contain '/' or start with double underscore '__'.
    ArgumentNullException

    workerParent - All workers must have a parent. The top level workers have the worker system as parent.

    InvalidOperationException
    • Cannot add child worker to parent which has completed. Are you adding it to the correct parent?
    • Cannot add worker to parent, since its children have been started. Are you adding it to the correct parent?
    ArgumentException
    • aggregationCommandAction must be non-null.
    • If aggregationCommandAction is non-null, must specify at least one aggregation function
    • Cannot output to the same column twice.
    • Could not find column name in type.
    • Cannot implicitly cast from input type to output type.
    • Input and Output column names must both be either nullable or not nullable.
    • No aggregation commands specified.
    ArgumentNullException

    inputColumnName

    See Also
    AggregateTransform1<TInputOutput>(in DownstreamFactory<TInputOutput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>)
    AggregateTransform2<TInput, TOutput>(in DownstreamFactory<TInput>, String, Action<IAggregationCommand>)

    AggregateTransform1<TInputOutput>(in DownstreamFactory<TInputOutput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>)

    Initializes a new instance of the AggregateTransform<TInputAccumulateOutput> dataflow worker, which aggregates columns and groups incoming rows, and outputs exactly one row per unique grouping (if there are any input rows) with zero or more columns populated by aggregation functions (Average, Count, CountDistinct, CountRows, First, Last, Max, Min and Sum, see IAggregationCommand for details). Also see AggregateTransform1<TInputOutput>(in DownstreamFactory<TInputOutput>, String, Action<IAggregationCommand>) which excludes grouping, as well as the examples in Dataflow Aggregations.

    The input and output rows have the same type.

    The Input port is linked to (if available) the upstream output or error output port specified by the factory.

    Declaration
    public static AggregateTransform<TInputOutput> AggregateTransform1<TInputOutput>(this in DownstreamFactory<TInputOutput> downstreamFactory, string workerName, Action<IAggregationCommand> aggregationCommandAction, Action<IGroupByCopyCommand> groupByCopyCommandAction)
        where TInputOutput : class, new()
    Parameters
    Type Name Description
    DownstreamFactory<TInputOutput> downstreamFactory

    The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to.

    Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).

    String workerName

    Name of the worker.

    Set to a prefix plus a trailing "/" (e.g. "MyPrefix-/") to generate a unique name from the prefix plus an increasing number starting at 1.

    While less useful, set to null, whitespace or "/" to generate a unique name from the worker type plus an increasing number starting at 1.

    The name cannot otherwise contain "/", and cannot start with double underscore "__".

    Action<IAggregationCommand> aggregationCommandAction

    Commands for specifying predefined aggregation functions (Average, Count, CountDistinct, CountRows, First, Last, Max, Min and Sum) as well as input and output column names. Specify a single column name to use it for both input and output. Example use:

    ac => ac
        .Max(nameof(MyClass.Price))
        .Average(nameof(MyClass.Price)
               , nameof(MyClass.AveragePrice))

    See IAggregationCommand for details.

    Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match.

    Action<IGroupByCopyCommand> groupByCopyCommandAction

    Commands to specify grouping columns in the incoming rows. E.g. to group by Category and Year:

    g => g.Name(nameof(MyClass.Category))
          .Name(nameof(MyClass.Year))

    Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match.

    Set to null to not group rows.

    Returns
    Type Description
    AggregateTransform<TInputOutput>

    The newly created and (optionally) linked worker.

    Type Parameters
    Name Description
    TInputOutput

    The type of the input and output rows. Must be a concrete class type with a public parameterless constructor.

    Remarks

    Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.

    Also consider whether to use non-nullable (e.g. int) or nullable (e.g. int?) .Net types, or the corresponding database provider type if available (e.g. SqlInt32).

    The aggregation functions follow the SQL-92 standard:

    • Aggregation without grouping will always produce one output row, including for an empty input set
    • On an empty input set, Count aggregations return 0, whereas the other aggregations return null
    • With grouping, an empty input set will not produce any output rows
    • Count aggregations support nullable input column types. Non-Count aggregations support nullable input column value types if both input and output columns are nullable.
    • null input column values are not included in aggregations (CountRows however doesn't look at input column values at all)
    • If the aggregation result type (usually the same as the input column type), can be implicitly cast to the output column type, then they can have different types, e.g. long to double
    • Sum and Average additions are checked for overflow, Count additions are not

    Exceptions
    Type Condition
    ArgumentException
    • At least one of aggregationCommandAction and groupByCopyCommandAction
    • If aggregationCommandAction is non-null, must specify at least one aggregation function
    • If groupByCopyCommandAction is non-null, must specify at least one group by column
    ArgumentException

    workerName:

    • Workers with the same parent must have unique names.
    • Worker and worker system names cannot contain '/' or start with double underscore '__'.
    ArgumentNullException

    workerParent - All workers must have a parent. The top level workers have the worker system as parent.

    InvalidOperationException
    • Cannot add child worker to parent which has completed. Are you adding it to the correct parent?
    • Cannot add worker to parent, since its children have been started. Are you adding it to the correct parent?
    ArgumentException
    • At least one of aggregationCommandAction and groupByCopyCommandAction must be non-null.
    • If aggregationCommandAction is non-null, must specify at least one aggregation function
    • If groupByCopyCommandAction is non-null, must specify at least one group by column
    • Cannot output to the same column twice.
    • Could not find column name in type.
    • Cannot implicitly cast from input type to output type.
    • Input and Output column names must both be either nullable or not nullable.
    • No aggregation commands specified.
    ArgumentNullException

    inputColumnName

    See Also
    AggregateTransform1<TInputOutput>(in DownstreamFactory<TInputOutput>, String, Action<IAggregationCommand>)
    AggregateTransform2<TInput, TOutput>(in DownstreamFactory<TInput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>)

    AggregateTransform1<TInputAccumulateOutput>(in DownstreamFactory<TInputAccumulateOutput>, String, Action<TransformAggregation<TInputAccumulateOutput>>, Action<TransformAggregation<TInputAccumulateOutput>>, Func<TransformAggregation<TInputAccumulateOutput>, TInputAccumulateOutput>, IEqualityComparer<TInputAccumulateOutput>)

    Initializes a new instance of the AggregateTransform<TInputAccumulateOutput> dataflow worker, which uses custom callbacks to aggregate and optionally group incoming rows, and output at most one row for all input rows or per unique grouping. Use this overload when the pre-defined aggregation functions in other overloads are not sufficient. The input rows, the accumulation, and the output row all have the same type. The output row(s), if any, defaults to the final value of the Accumulation.

    Note that there are no default seed or accumulation values - to produce output rows, the seedAction or accumulationAction must set Accumulation, or outputFunc must return a row.

    The Input port is linked to (if available) the upstream output or error output port specified by the factory.

    Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.

    Furthermore, consider how to treat any null values. In SQL-92 for instance, COUNT (column name) and COUNT (DISTINCT column name) functions ignore nulls, but COUNT (*) includes rows with null values. Consider whether to use non-nullable (e.g. int) or nullable (e.g. int?) .Net types, or the corresponding database provider type if available (e.g. SqlInt32).

    Note that all dataflow workers must adhere to the Row Ownership rules.

    Declaration
    public static AggregateTransform<TInputAccumulateOutput> AggregateTransform1<TInputAccumulateOutput>(this in DownstreamFactory<TInputAccumulateOutput> downstreamFactory, string workerName, Action<TransformAggregation<TInputAccumulateOutput>> seedAction, Action<TransformAggregation<TInputAccumulateOutput>> accumulationAction, Func<TransformAggregation<TInputAccumulateOutput>, TInputAccumulateOutput> outputFunc, IEqualityComparer<TInputAccumulateOutput> groupByEqualityComparer)
        where TInputAccumulateOutput : class
    Parameters
    Type Name Description
    DownstreamFactory<TInputAccumulateOutput> downstreamFactory

    The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to.

    Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).

    String workerName

    Name of the worker.

    Set to a prefix plus a trailing "/" (e.g. "MyPrefix-/") to generate a unique name from the prefix plus an increasing number starting at 1.

    While less useful, set to null, whitespace or "/" to generate a unique name from the worker type plus an increasing number starting at 1.

    The name cannot otherwise contain "/", and cannot start with double underscore "__".

    Action<TransformAggregation<TInputAccumulateOutput>> seedAction

    The seed action, which will be invoked once for the first (if any) incoming row, per grouping (if any). Can be null to avoid calling it.

    It takes the TransformAggregation<TInputAccumulate> aggregation instance as a parameter, with the first row assigned to NextRow, which this action can use to populate Accumulation.

    Set Status to a completed status to discard any additional input rows - Succeeded will invoke the outputFunc with the current aggregation values; a failure status will fail the worker without invoking the outputFunc.

    Action<TransformAggregation<TInputAccumulateOutput>> accumulationAction

    The accumulation action, which will be invoked once for each incoming row, if there are any. Can be null to avoid calling it.

    It takes the TransformAggregation<TInputAccumulate> aggregation instance as a parameter, with the next incoming row assigned to NextRow, which this action uses to update the Accumulation.

    Set Status to a completed status to discard any additional input rows - Succeeded will invoke the outputFunc with the current aggregation values; a failure status will fail the worker without invoking the outputFunc.

    Func<TransformAggregation<TInputAccumulateOutput>, TInputAccumulateOutput> outputFunc

    The output function, which must return the output row, or null to not output a row. Can be null, in which case the final value of Accumulation is used as output row.

    Without grouping, it will be invoked once, after all incoming rows have been processed, including if there are no incoming rows.

    With grouping, it will be invoked once for each grouping, after all incoming rows have been processed. It will not be invoked if there are no incoming rows.

    It takes the TransformAggregation<TInputAccumulate> aggregation instance as a parameter. Count holds the number of rows accumulated, and NextRow holds the last incoming row (if any). Accumulation holds the final accumulation, which is normally used to create and return the output row. Note that Accumulation and NextRow will be null if there are no incoming rows. Consider whether or not to create and return an empty output row if there are no incoming rows.

    Set Status to a failure status to fail the worker.

    IEqualityComparer<TInputAccumulateOutput> groupByEqualityComparer

    An instance that compares incoming rows. Rows that compare equal will be in the same grouping. Typically Create<T>(Action<IGroupByCommand>) is used to create the comparer, but Create<T, TKey>(Func<T, TKey>) or a custom one can also be used.

    Set to null to not group rows.

    Returns
    Type Description
    AggregateTransform<TInputAccumulateOutput>

    The newly created and (optionally) linked worker.

    Type Parameters
    Name Description
    TInputAccumulateOutput

    The type of each Input, accumulation, and Output row.

    Exceptions
    Type Condition
    ArgumentException

    One of seedAction, accumulationAction and outputFunc must be non-null.

    ArgumentException

    workerName:

    • Workers with the same parent must have unique names.
    • Worker and worker system names cannot contain '/' or start with double underscore '__'.
    ArgumentNullException

    workerParent - All workers must have a parent. The top level workers have the worker system as parent.

    InvalidOperationException
    • Cannot add child worker to parent which has completed. Are you adding it to the correct parent?
    • Cannot add worker to parent, since its children have been started. Are you adding it to the correct parent?

    AggregateTransform2<TInput, TOutput>(in DownstreamFactory<TInput>, String, Action<IAggregationCommand>)

    Initializes a new instance of the AggregateTransform<TInput, TAccumulateOutput> dataflow worker, which aggregates incoming rows, and outputs exactly one row with one or more columns populated by aggregation functions Average, Count, CountDistinct, CountRows, First, Last, Max, Min and Sum, see IAggregationCommand for details. Also see AggregateTransform2<TInput, TOutput>(in DownstreamFactory<TInput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>) which includes grouping, as well as the examples in Dataflow Aggregations.

    The input and output rows can have a different types.

    The Input port is linked to (if available) the upstream output or error output port specified by the factory.

    Note that all the type parameters must be specified with this overload.

    Declaration
    public static AggregateTransform<TInput, TOutput> AggregateTransform2<TInput, TOutput>(this in DownstreamFactory<TInput> downstreamFactory, string workerName, Action<IAggregationCommand> aggregationCommandAction)
        where TInput : class where TOutput : class, new()
    Parameters
    Type Name Description
    DownstreamFactory<TInput> downstreamFactory

    The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to.

    Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).

    String workerName

    Name of the worker.

    Set to a prefix plus a trailing "/" (e.g. "MyPrefix-/") to generate a unique name from the prefix plus an increasing number starting at 1.

    While less useful, set to null, whitespace or "/" to generate a unique name from the worker type plus an increasing number starting at 1.

    The name cannot otherwise contain "/", and cannot start with double underscore "__".

    Action<IAggregationCommand> aggregationCommandAction

    Commands for specifying predefined aggregation functions (Average, Count, CountDistinct, CountRows, First, Last, Max, Min and Sum) as well as input and output column names. Specify a single column name to use it for both input and output. Example use:

    ac => ac
        .Max(nameof(MyClass.Price))
        .Average(nameof(MyClass.Price)
               , nameof(MyClass.AveragePrice))

    See IAggregationCommand for details.

    Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match.

    Returns
    Type Description
    AggregateTransform<TInput, TOutput>

    The newly created and (optionally) linked worker.

    Type Parameters
    Name Description
    TInput

    The type of the incoming rows.

    TOutput

    The type of the output row. Must be a concrete class type with a public parameterless constructor.

    Remarks

    Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.

    Also consider whether to use non-nullable (e.g. int) or nullable (e.g. int?) .Net types, or the corresponding database provider type if available (e.g. SqlInt32).

    The aggregation functions follow the SQL-92 standard:

    • Aggregation without grouping will always produce one output row, including for an empty input set
    • On an empty input set, Count aggregations return 0, whereas the other aggregations return null
    • With grouping, an empty input set will not produce any output rows
    • Count aggregations support nullable input column types. Non-Count aggregations support nullable input column value types if both input and output columns are nullable.
    • null input column values are not included in aggregations (CountRows however doesn't look at input column values at all)
    • If the aggregation result type (usually the same as the input column type), can be implicitly cast to the output column type, then they can have different types, e.g. long to double
    • Sum and Average additions are checked for overflow, Count additions are not

    Exceptions
    Type Condition
    ArgumentException

    workerName:

    • Workers with the same parent must have unique names.
    • Worker and worker system names cannot contain '/' or start with double underscore '__'.
    ArgumentNullException

    workerParent - All workers must have a parent. The top level workers have the worker system as parent.

    InvalidOperationException
    • Cannot add child worker to parent which has completed. Are you adding it to the correct parent?
    • Cannot add worker to parent, since its children have been started. Are you adding it to the correct parent?
    See Also
    AggregateTransform2<TInput, TOutput>(in DownstreamFactory<TInput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>)
    AggregateTransform1<TInputOutput>(in DownstreamFactory<TInputOutput>, String, Action<IAggregationCommand>)

    AggregateTransform2<TInput, TOutput>(in DownstreamFactory<TInput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>)

    Initializes a new instance of the AggregateTransform<TInput, TAccumulateOutput> dataflow worker, which aggregates columns and groups incoming rows, and outputs exactly one row per unique grouping (if there are any input rows) with one or more columns populated by aggregation functions (Average, Count, CountDistinct, CountRows, First, Last, Max, Min and Sum, see IAggregationCommand for details). Also see AggregateTransform2<TInput, TOutput>(in DownstreamFactory<TInput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>) which includes grouping, as well as the examples in Dataflow Aggregations.

    The input and output rows can have a different types.

    The Input port is linked to (if available) the upstream output or error output port specified by the factory.

    Note that all the type parameters must be specified with this overload.

    Declaration
    public static AggregateTransform<TInput, TOutput> AggregateTransform2<TInput, TOutput>(this in DownstreamFactory<TInput> downstreamFactory, string workerName, Action<IAggregationCommand> aggregationCommandAction, Action<IGroupByCopyCommand> groupByCopyCommandAction)
        where TInput : class where TOutput : class, new()
    Parameters
    Type Name Description
    DownstreamFactory<TInput> downstreamFactory

    The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to.

    Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).

    String workerName

    Name of the worker.

    Set to a prefix plus a trailing "/" (e.g. "MyPrefix-/") to generate a unique name from the prefix plus an increasing number starting at 1.

    While less useful, set to null, whitespace or "/" to generate a unique name from the worker type plus an increasing number starting at 1.

    The name cannot otherwise contain "/", and cannot start with double underscore "__".

    Action<IAggregationCommand> aggregationCommandAction

    Commands for specifying predefined aggregation functions (Average, Count, CountDistinct, CountRows, First, Last, Max, Min and Sum) as well as input and output column names. Specify a single column name to use it for both input and output. Example use:

    ac => ac
        .Max(nameof(MyClass.Price))
        .Average(nameof(MyClass.Price)
               , nameof(MyClass.AveragePrice))

    See IAggregationCommand for details.

    Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match.

    Action<IGroupByCopyCommand> groupByCopyCommandAction

    Commands to specify grouping columns in the incoming rows. E.g. to group by Category and Year:

    g => g.Name(nameof(MyClass.Category))
          .Name(nameof(MyClass.Year))

    Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match.

    Set to null to not group rows.

    Returns
    Type Description
    AggregateTransform<TInput, TOutput>

    The newly created and (optionally) linked worker.

    Type Parameters
    Name Description
    TInput

    The type of the incoming rows.

    TOutput

    The type of the output rows. Must be a concrete class type with a public parameterless constructor.

    Remarks

    Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.

    Also consider whether to use non-nullable (e.g. int) or nullable (e.g. int?) .Net types, or the corresponding database provider type if available (e.g. SqlInt32).

    The aggregation functions follow the SQL-92 standard:

    • Aggregation without grouping will always produce one output row, including for an empty input set
    • On an empty input set, Count aggregations return 0, whereas the other aggregations return null
    • With grouping, an empty input set will not produce any output rows
    • Count aggregations support nullable input column types. Non-Count aggregations support nullable input column value types if both input and output columns are nullable.
    • null input column values are not included in aggregations (CountRows however doesn't look at input column values at all)
    • If the aggregation result type (usually the same as the input column type), can be implicitly cast to the output column type, then they can have different types, e.g. long to double
    • Sum and Average additions are checked for overflow, Count additions are not

    Exceptions
    Type Condition
    ArgumentException

    workerName:

    • Workers with the same parent must have unique names.
    • Worker and worker system names cannot contain '/' or start with double underscore '__'.
    ArgumentNullException

    workerParent - All workers must have a parent. The top level workers have the worker system as parent.

    InvalidOperationException
    • Cannot add child worker to parent which has completed. Are you adding it to the correct parent?
    • Cannot add worker to parent, since its children have been started. Are you adding it to the correct parent?
    See Also
    AggregateTransform2<TInput, TOutput>(in DownstreamFactory<TInput>, String, Action<IAggregationCommand>)
    AggregateTransform1<TInputOutput>(in DownstreamFactory<TInputOutput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>)

    AggregateTransform2<TInput, TAccumulateOutput>(in DownstreamFactory<TInput>, String, Action<TransformAggregation<TInput, TAccumulateOutput>>, Action<TransformAggregation<TInput, TAccumulateOutput>>, Func<TransformAggregation<TInput, TAccumulateOutput>, TAccumulateOutput>, IEqualityComparer<TInput>)

    Initializes a new instance of the AggregateTransform<TInput, TAccumulateOutput> dataflow worker, which uses custom callbacks to aggregate and optionally group incoming rows, and output at most one row for all input rows or per unique grouping. Use this overload when the pre-defined aggregation functions in other overloads are not sufficient. The accumulation, and the output rows (if any) both have the same type, but input rows can have a different type.

    Note that there are no default seed or accumulation values - to produce output rows, the seedAction or accumulationAction must set Accumulation, or outputFunc must return a row.

    The Input port is linked to (if available) the upstream output or error output port specified by the factory.

    Note that all the type parameters must be specified with this overload.

    Note that all dataflow workers must adhere to the Row Ownership rules.

    Declaration
    public static AggregateTransform<TInput, TAccumulateOutput> AggregateTransform2<TInput, TAccumulateOutput>(this in DownstreamFactory<TInput> downstreamFactory, string workerName, Action<TransformAggregation<TInput, TAccumulateOutput>> seedAction, Action<TransformAggregation<TInput, TAccumulateOutput>> accumulationAction, Func<TransformAggregation<TInput, TAccumulateOutput>, TAccumulateOutput> outputFunc, IEqualityComparer<TInput> groupByEqualityComparer)
        where TInput : class where TAccumulateOutput : class
    Parameters
    Type Name Description
    DownstreamFactory<TInput> downstreamFactory

    The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to.

    Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).

    String workerName

    Name of the worker.

    Set to a prefix plus a trailing "/" (e.g. "MyPrefix-/") to generate a unique name from the prefix plus an increasing number starting at 1.

    While less useful, set to null, whitespace or "/" to generate a unique name from the worker type plus an increasing number starting at 1.

    The name cannot otherwise contain "/", and cannot start with double underscore "__".

    Action<TransformAggregation<TInput, TAccumulateOutput>> seedAction

    The seed action, which will be invoked once for the first (if any) incoming row, per grouping (if any). Can be null to avoid calling it.

    It takes the TransformAggregation<TInput, TAccumulate> aggregation instance as a parameter, with the first row assigned to NextRow, which this action can use to populate Accumulation.

    Set Status to a completed status to discard any additional input rows - Succeeded will invoke the outputFunc with the current aggregation values; a failure status will fail the worker without invoking the outputFunc.

    Action<TransformAggregation<TInput, TAccumulateOutput>> accumulationAction

    The accumulation action, which will be invoked once for each incoming row, if there are any. Can be null to avoid calling it.

    It takes the TransformAggregation<TInput, TAccumulate> aggregation instance as a parameter, with the next incoming row assigned to NextRow, which this action uses to update the Accumulation.

    Set Status to a completed status to discard any additional input rows - Succeeded will invoke the outputFunc with the current aggregation values; a failure status will fail the worker without invoking the outputFunc.

    Func<TransformAggregation<TInput, TAccumulateOutput>, TAccumulateOutput> outputFunc

    The output function, which must return the output row, or null to not output a row. Can be null, in which case the final value of Accumulation is used as output row.

    Without grouping, it will be invoked once, after all incoming rows have been processed, including if there are no incoming rows.

    With grouping, it will be invoked once for each grouping, after all incoming rows have been processed. It will not be invoked if there are no incoming rows.

    It takes the TransformAggregation<TInput, TAccumulate> aggregation instance as a parameter. Count holds the number of rows accumulated, and NextRow holds the last incoming row (if any). Accumulation holds the final accumulation, which is normally used to create and return the output row. Note that Accumulation and NextRow will be null if there are no incoming rows. Consider whether or not to create and return an empty output row if there are no incoming rows.

    Set Status to a failure status to fail the worker.

    IEqualityComparer<TInput> groupByEqualityComparer

    An instance that compares incoming rows. Rows that compare equal will be in the same grouping. Typically Create<T>(Action<IGroupByCommand>) is used to create the comparer, but Create<T, TKey>(Func<T, TKey>) or a custom one can also be used.

    Set to null to not group rows.

    Returns
    Type Description
    AggregateTransform<TInput, TAccumulateOutput>

    The newly created and (optionally) linked worker.

    Type Parameters
    Name Description
    TInput

    The type of the incoming rows.

    TAccumulateOutput

    The type of the accumulation seed and the Output row.

    Remarks

    Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.

    Furthermore, consider how to treat any null values. In SQL-92 for instance, COUNT (column name) and COUNT (DISTINCT column name) functions ignore nulls, but COUNT (*) includes rows with null values. Consider whether to use non-nullable (e.g. int) or nullable (e.g. int?) .Net types, or the corresponding database provider type if available (e.g. SqlInt32).

    Exceptions
    Type Condition
    ArgumentException

    One of seedAction, accumulationAction and outputFunc must be non-null.

    ArgumentException

    workerName:

    • Workers with the same parent must have unique names.
    • Worker and worker system names cannot contain '/' or start with double underscore '__'.
    ArgumentNullException

    workerParent - All workers must have a parent. The top level workers have the worker system as parent.

    InvalidOperationException
    • Cannot add child worker to parent which has completed. Are you adding it to the correct parent?
    • Cannot add worker to parent, since its children have been started. Are you adding it to the correct parent?

    AggregateTransform3<TInput, TAccumulate, TOutput>(in DownstreamFactory<TInput>, String, Action<TransformAggregation<TInput, TAccumulate>>, Action<TransformAggregation<TInput, TAccumulate>>, Func<TransformAggregation<TInput, TAccumulate>, TOutput>, IEqualityComparer<TInput>)

    Initializes a new instance of the AggregateTransform<TInput, TAccumulate, TOutput> dataflow worker, which uses custom callbacks to aggregate and optionally group incoming rows, and output at most one row for all input rows or per unique grouping. Use this overload when the pre-defined aggregation functions in other overloads are not sufficient. The input rows, the accumulation, and the output row (if any) can all have a different types.

    Note that there are no default seed or accumulation values, which can instead be set with seedAction and accumulationAction.

    The Input port is linked to (if available) the upstream output or error output port specified by the factory.

    Note that the type parameters must normally be specified with this overload.

    Note that all dataflow workers must adhere to the Row Ownership rules.

    Declaration
    public static AggregateTransform<TInput, TAccumulate, TOutput> AggregateTransform3<TInput, TAccumulate, TOutput>(this in DownstreamFactory<TInput> downstreamFactory, string workerName, Action<TransformAggregation<TInput, TAccumulate>> seedAction, Action<TransformAggregation<TInput, TAccumulate>> accumulationAction, Func<TransformAggregation<TInput, TAccumulate>, TOutput> outputFunc, IEqualityComparer<TInput> groupByEqualityComparer)
        where TInput : class where TOutput : class
    Parameters
    Type Name Description
    DownstreamFactory<TInput> downstreamFactory

    The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to.

    Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).

    String workerName

    Name of the worker.

    Set to a prefix plus a trailing "/" (e.g. "MyPrefix-/") to generate a unique name from the prefix plus an increasing number starting at 1.

    While less useful, set to null, whitespace or "/" to generate a unique name from the worker type plus an increasing number starting at 1.

    The name cannot otherwise contain "/", and cannot start with double underscore "__".

    Action<TransformAggregation<TInput, TAccumulate>> seedAction

    The seed action, which will be invoked once for the first (if any) incoming row, per grouping (if any). Can be null to avoid calling it.

    It takes the TransformAggregation<TInput, TAccumulate> aggregation instance as a parameter, with the first row assigned to NextRow, which this action can use to populate Accumulation.

    Set Status to a completed status to discard any additional input rows - Succeeded will invoke the outputFunc with the current aggregation values; a failure status will fail the worker without invoking the outputFunc.

    Action<TransformAggregation<TInput, TAccumulate>> accumulationAction

    The accumulation action, which will be invoked once for each incoming row, if there are any. Can be null to avoid calling it.

    It takes the TransformAggregation<TInput, TAccumulate> aggregation instance as a parameter, with the next incoming row assigned to NextRow, which this action uses to update the Accumulation.

    Set Status to a completed status to discard any additional input rows - Succeeded will invoke the outputFunc with the current aggregation values; a failure status will fail the worker without invoking the outputFunc.

    Func<TransformAggregation<TInput, TAccumulate>, TOutput> outputFunc

    The output function, which must return the output row, or null to not output a row. The parameter can be null, in which case TOutput must be assignable from TAccumulate and the final Accumulation will be used as the output row.

    Without grouping, it will be invoked once, after all incoming rows have been processed, including if there are no incoming rows.

    With grouping, it will be invoked once for each grouping, after all incoming rows have been processed. It will not be invoked if there are no incoming rows.

    It takes the TransformAggregation<TInput, TAccumulate> aggregation instance as a parameter. Count holds the number of rows accumulated, and NextRow holds the last incoming row (if any). Accumulation holds the final accumulation, which is normally used to create and return the output row. Note that Accumulation and NextRow will be null if there are no incoming rows. Consider whether or not to create and return an empty output row if there are no incoming rows.

    Set Status to a failure status to fail the worker.

    IEqualityComparer<TInput> groupByEqualityComparer

    An instance that compares incoming rows. Rows that compare equal will be in the same grouping. Typically Create<T>(Action<IGroupByCommand>) is used to create the comparer, but Create<T, TKey>(Func<T, TKey>) or a custom one can also be used.

    Set to null to not group rows.

    Returns
    Type Description
    AggregateTransform<TInput, TAccumulate, TOutput>

    The newly created and (optionally) linked worker.

    Type Parameters
    Name Description
    TInput

    The type of the input rows.

    TAccumulate

    The type of the Accumulation. Can be a value type or a reference type.

    TOutput

    The type of the output row.

    Remarks

    Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.

    Furthermore, consider how to treat any null values. In SQL-92 for instance, COUNT (column name) and COUNT (DISTINCT column name) functions ignore nulls, but COUNT (*) includes rows with null values. Consider whether to use non-nullable (e.g. int) or nullable (e.g. int?) .Net types, or the corresponding database provider type if available (e.g. SqlInt32).

    Exceptions
    Type Condition
    ArgumentException

    One of seedAction, accumulationAction and outputFunc must be non-null.

    InvalidOperationException

    Either use an output row type that is assignable from the accumulation type, or use an output function.

    ArgumentException

    workerName:

    • Workers with the same parent must have unique names.
    • Worker and worker system names cannot contain '/' or start with double underscore '__'.
    ArgumentNullException

    workerParent - All workers must have a parent. The top level workers have the worker system as parent.

    InvalidOperationException
    • Cannot add child worker to parent which has completed. Are you adding it to the correct parent?
    • Cannot add worker to parent, since its children have been started. Are you adding it to the correct parent?

    See Also

    IAggregationCommand
    TransformAggregation<TInputAccumulate>
    TransformAggregation<TInput, TAccumulate>
    RowAggregationFunction
    IGroupByCommand
    IGroupByCopyCommand
    AggregateTransform<TInputAccumulateOutput>
    AggregateTransform<TInput, TAccumulateOutput>
    AggregateTransform<TInput, TAccumulate, TOutput>
    In This Article
    Back to top Copyright © 2023 Envobi Ltd