Class AggregateTransformFactory
Factory methods that create aggregation and grouping dataflow workers, which aggregate and (optionally) group incoming rows, and output at most one row for all input rows or per unique grouping. Also see the examples in Dataflow Aggregations.
Aggregations can be specified in the following ways:
-
Predefined column aggregations
Average,Count,CountDistinct,CountRows,First,Last,Max,MinandSum. Also see IAggregationCommand for details. -
Predefined row aggregations such as
First,Last,Singleetc. Also see RowAggregationFunction for details. - Custom seed, accumulation and output callbacks to implement custom aggregations not provided by the predefined aggregation functions.
Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.
The incoming rows can optionally also be grouped (i.e. perform a GROUP BY),
by either specifying which columns to group by, or by providing either a grouping key function,
or a row equality comparer.
Also note that:
-
AggregateTransform1()overloads create AggregateTransform<TInputAccumulateOutput> workers, where input, accumulation and output types are the same. -
AggregateTransform2()overloads create AggregateTransform<TInput, TAccumulateOutput> workers, where the input type can be different from the accumulation and output types. -
AggregateTransform3()overloads create AggregateTransform<TInput, TAccumulate, TOutput> workers, where the input, accumulation and output types can all be different.
These workers are fully blocking, i.e. they will only output the row(s) after they have received all incoming rows. They will buffer (and therefore consume memory for) only a single accumulation (without grouping), or multiple accumulations corresponding to the number of unique groupings.
Note that by default on .NET Framework, maximum array size is 2GB, which in turn with a 64-bit application limits the number of unique groupings to a maximum of 47.9 million, and the CountDistinct(String) number of distinct values per column to a maximum of just over 89 million. You can remove these limits by enabling support for larger arrays, as described in <gcAllowVeryLargeObjects> Element.
.NET 6+ on the other hand supports >2GB arrays by default.
The Input port is linked to (if available) the upstream
output or error output port specified by the factory.
Get the factory from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>).
Namespace: actionETL
Assembly: actionETL.dll
Syntax
public static class AggregateTransformFactory
Methods
AggregateTransform1<TInputAccumulateOutput>(in DownstreamFactory<TInputAccumulateOutput>, String, RowAggregationFunction)
Initializes a new instance of the AggregateTransform<TInputAccumulateOutput> dataflow worker, which aggregates incoming rows, and outputs at most one row. The input rows, the accumulation, and the output row all have the same type.
The aggregation is one of the predefined RowAggregationFunctions available that operate on whole rows, such as First, Single etc. Use other overloads to aggregate individual columns.
The Input port is linked to (if available) the upstream
output or error output port specified by the factory.
Declaration
public static AggregateTransform<TInputAccumulateOutput> AggregateTransform1<TInputAccumulateOutput>(this in DownstreamFactory<TInputAccumulateOutput> downstreamFactory, string workerName, RowAggregationFunction rowAggregationFunction)
where TInputAccumulateOutput : class
Parameters
| Type | Name | Description |
|---|---|---|
| DownstreamFactory<TInputAccumulateOutput> | downstreamFactory | The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to. Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>). |
| String | workerName | Name of the worker.
Set to a prefix plus a trailing
While less useful, set to
The name cannot otherwise contain |
| RowAggregationFunction | rowAggregationFunction |
Returns
| Type | Description |
|---|---|
| AggregateTransform<TInputAccumulateOutput> | The newly created and (optionally) linked worker. |
Type Parameters
| Name | Description |
|---|---|
| TInputAccumulateOutput | The type of each |
Exceptions
| Type | Condition |
|---|---|
| ArgumentException |
|
| ArgumentNullException |
|
| InvalidOperationException |
|
AggregateTransform1<TInputAccumulateOutput>(in DownstreamFactory<TInputAccumulateOutput>, String, RowAggregationFunction, Action<IGroupByCommand>)
Initializes a new instance of the AggregateTransform<TInputAccumulateOutput> dataflow worker, which groups incoming rows using an IGroupByCommand, aggregates each group, and outputs at most one row per unique grouping. The input rows, the accumulation, and the output row all have the same type.
The aggregation is one of the predefined RowAggregationFunctions available that operate on whole rows, such as First, Single etc. Use other overloads to aggregate individual columns.
The Input port is linked to (if available) the upstream
output or error output port specified by the factory.
Declaration
public static AggregateTransform<TInputAccumulateOutput> AggregateTransform1<TInputAccumulateOutput>(this in DownstreamFactory<TInputAccumulateOutput> downstreamFactory, string workerName, RowAggregationFunction rowAggregationFunction, Action<IGroupByCommand> groupByCommandAction)
where TInputAccumulateOutput : class
Parameters
| Type | Name | Description |
|---|---|---|
| DownstreamFactory<TInputAccumulateOutput> | downstreamFactory | The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to. Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>). |
| String | workerName | Name of the worker.
Set to a prefix plus a trailing
While less useful, set to
The name cannot otherwise contain |
| RowAggregationFunction | rowAggregationFunction | |
| Action<IGroupByCommand> | groupByCommandAction | Commands to specify grouping columns in the incoming rows. E.g. to group by
Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match. Set to |
Returns
| Type | Description |
|---|---|
| AggregateTransform<TInputAccumulateOutput> | The newly created and (optionally) linked worker. |
Type Parameters
| Name | Description |
|---|---|
| TInputAccumulateOutput | The type of each |
Exceptions
| Type | Condition |
|---|---|
| ArgumentException |
|
| ArgumentNullException |
|
| InvalidOperationException |
|
AggregateTransform1<TInputAccumulateOutput>(in DownstreamFactory<TInputAccumulateOutput>, String, RowAggregationFunction, IEqualityComparer<TInputAccumulateOutput>)
Initializes a new instance of the AggregateTransform<TInputAccumulateOutput> dataflow worker, which groups incoming rows using an IEqualityComparer<T>, aggregates each group, and outputs at most one row per unique grouping. The input rows, the accumulation, and the output row all have the same type.
The aggregation is one of the predefined RowAggregationFunctions available that operate on whole rows, such as First, Single etc. Use other overloads to aggregate individual columns.
The Input port is linked to (if available) the upstream
output or error output port specified by the factory.
Declaration
public static AggregateTransform<TInputAccumulateOutput> AggregateTransform1<TInputAccumulateOutput>(this in DownstreamFactory<TInputAccumulateOutput> downstreamFactory, string workerName, RowAggregationFunction rowAggregationFunction, IEqualityComparer<TInputAccumulateOutput> groupByEqualityComparer)
where TInputAccumulateOutput : class
Parameters
| Type | Name | Description |
|---|---|---|
| DownstreamFactory<TInputAccumulateOutput> | downstreamFactory | The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to. Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>). |
| String | workerName | Name of the worker.
Set to a prefix plus a trailing
While less useful, set to
The name cannot otherwise contain |
| RowAggregationFunction | rowAggregationFunction | |
| IEqualityComparer<TInputAccumulateOutput> | groupByEqualityComparer | An instance that compares incoming rows. Rows that compare equal will be in the same grouping. Typically Create<T>(Action<IGroupByCommand>) is used to create the comparer, but Create<T, TKey>(Func<T, TKey>) or a custom one can also be used. Set to |
Returns
| Type | Description |
|---|---|
| AggregateTransform<TInputAccumulateOutput> | The newly created and (optionally) linked worker. |
Type Parameters
| Name | Description |
|---|---|
| TInputAccumulateOutput | The type of each |
Exceptions
| Type | Condition |
|---|---|
| ArgumentException |
|
| ArgumentNullException |
|
| InvalidOperationException |
|
AggregateTransform1<TInputOutput>(in DownstreamFactory<TInputOutput>, String, Action<IAggregationCommand>)
Initializes a new instance of the
AggregateTransform<TInputAccumulateOutput>
dataflow worker, which aggregates incoming rows, and outputs exactly one row with one or more
columns populated by aggregation functions Average, Count, CountDistinct,
CountRows, First, Last, Max, Min and Sum,
see IAggregationCommand for details. Also see
AggregateTransform1<TInputOutput>(in DownstreamFactory<TInputOutput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>)
which includes grouping, as well as the examples in
Dataflow Aggregations.
The input and output rows have the same type.
The Input port is linked to (if available) the upstream
output or error output port specified by the factory.
Declaration
public static AggregateTransform<TInputOutput> AggregateTransform1<TInputOutput>(this in DownstreamFactory<TInputOutput> downstreamFactory, string workerName, Action<IAggregationCommand> aggregationCommandAction)
where TInputOutput : class, new()
Parameters
| Type | Name | Description |
|---|---|---|
| DownstreamFactory<TInputOutput> | downstreamFactory | The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to. Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>). |
| String | workerName | Name of the worker.
Set to a prefix plus a trailing
While less useful, set to
The name cannot otherwise contain |
| Action<IAggregationCommand> | aggregationCommandAction | Commands for specifying predefined aggregation functions (
See IAggregationCommand for details. Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match. |
Returns
| Type | Description |
|---|---|
| AggregateTransform<TInputOutput> | The newly created and (optionally) linked worker. |
Type Parameters
| Name | Description |
|---|---|
| TInputOutput | The type of the input and output rows. Must be a concrete class type with a public parameterless constructor. |
Remarks
Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.
Also consider whether to use non-nullable (e.g. int) or nullable
(e.g. int?) .Net types, or the corresponding database provider type if available
(e.g. SqlInt32).
The aggregation functions follow the SQL-92 standard:
- Aggregation without grouping will always produce one output row, including for an empty input set
-
On an empty input set,
Countaggregations return0, whereas the other aggregations returnnull - With grouping, an empty input set will not produce any output rows
-
Countaggregations support nullable input column types. Non-Countaggregations support nullable input column value types if both input and output columns are nullable. -
nullinput column values are not included in aggregations (CountRowshowever doesn't look at input column values at all) -
If the aggregation result type (usually the same as the input column type), can be implicitly cast to the
output column type, then they can have different types, e.g.
longtodouble -
SumandAverageadditions are checked for overflow,Countadditions are not
Exceptions
| Type | Condition |
|---|---|
| ArgumentException |
|
| ArgumentException |
|
| ArgumentNullException |
|
| InvalidOperationException |
|
| ArgumentException |
|
| ArgumentNullException | inputColumnName |
See Also
AggregateTransform1<TInputOutput>(in DownstreamFactory<TInputOutput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>)
Initializes a new instance of the
AggregateTransform<TInputAccumulateOutput>
dataflow worker, which aggregates columns and groups incoming rows, and outputs exactly one row
per unique grouping (if there are any input rows) with zero or more columns populated by
aggregation functions (Average, Count, CountDistinct,
CountRows, First, Last, Max, Min and Sum, see
IAggregationCommand for details). Also see
AggregateTransform1<TInputOutput>(in DownstreamFactory<TInputOutput>, String, Action<IAggregationCommand>)
which excludes grouping, as well as the examples in
Dataflow Aggregations.
The input and output rows have the same type.
The Input port is linked to (if available) the upstream
output or error output port specified by the factory.
Declaration
public static AggregateTransform<TInputOutput> AggregateTransform1<TInputOutput>(this in DownstreamFactory<TInputOutput> downstreamFactory, string workerName, Action<IAggregationCommand> aggregationCommandAction, Action<IGroupByCopyCommand> groupByCopyCommandAction)
where TInputOutput : class, new()
Parameters
| Type | Name | Description |
|---|---|---|
| DownstreamFactory<TInputOutput> | downstreamFactory | The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to. Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>). |
| String | workerName | Name of the worker.
Set to a prefix plus a trailing
While less useful, set to
The name cannot otherwise contain |
| Action<IAggregationCommand> | aggregationCommandAction | Commands for specifying predefined aggregation functions (
See IAggregationCommand for details. Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match. |
| Action<IGroupByCopyCommand> | groupByCopyCommandAction | Commands to specify grouping columns in the incoming rows. E.g. to group by
Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match. Set to |
Returns
| Type | Description |
|---|---|
| AggregateTransform<TInputOutput> | The newly created and (optionally) linked worker. |
Type Parameters
| Name | Description |
|---|---|
| TInputOutput | The type of the input and output rows. Must be a concrete class type with a public parameterless constructor. |
Remarks
Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.
Also consider whether to use non-nullable (e.g. int) or nullable
(e.g. int?) .Net types, or the corresponding database provider type if available
(e.g. SqlInt32).
The aggregation functions follow the SQL-92 standard:
- Aggregation without grouping will always produce one output row, including for an empty input set
-
On an empty input set,
Countaggregations return0, whereas the other aggregations returnnull - With grouping, an empty input set will not produce any output rows
-
Countaggregations support nullable input column types. Non-Countaggregations support nullable input column value types if both input and output columns are nullable. -
nullinput column values are not included in aggregations (CountRowshowever doesn't look at input column values at all) -
If the aggregation result type (usually the same as the input column type), can be implicitly cast to the
output column type, then they can have different types, e.g.
longtodouble -
SumandAverageadditions are checked for overflow,Countadditions are not
Exceptions
| Type | Condition |
|---|---|
| ArgumentException |
|
| ArgumentException |
|
| ArgumentNullException |
|
| InvalidOperationException |
|
| ArgumentException |
|
| ArgumentNullException | inputColumnName |
See Also
AggregateTransform1<TInputAccumulateOutput>(in DownstreamFactory<TInputAccumulateOutput>, String, Action<TransformAggregation<TInputAccumulateOutput>>, Action<TransformAggregation<TInputAccumulateOutput>>, Func<TransformAggregation<TInputAccumulateOutput>, TInputAccumulateOutput>, IEqualityComparer<TInputAccumulateOutput>)
Initializes a new instance of the AggregateTransform<TInputAccumulateOutput> dataflow worker, which uses custom callbacks to aggregate and optionally group incoming rows, and output at most one row for all input rows or per unique grouping. Use this overload when the pre-defined aggregation functions in other overloads are not sufficient. The input rows, the accumulation, and the output row all have the same type. The output row(s), if any, defaults to the final value of the Accumulation.
Note that there are no default seed or accumulation values - to produce output rows,
the seedAction or accumulationAction must set
Accumulation,
or outputFunc must return a row.
The Input port is linked to (if available) the upstream
output or error output port specified by the factory.
Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.
Furthermore, consider how to treat any null values. In SQL-92 for instance,
COUNT (column name) and COUNT (DISTINCT column name) functions ignore nulls,
but COUNT (*) includes rows with null values. Consider whether to use non-nullable
(e.g. int) or nullable (e.g. int?) .Net types, or the corresponding database
provider type if available (e.g. SqlInt32).
Note that all dataflow workers must adhere to the Row Ownership rules.
Declaration
public static AggregateTransform<TInputAccumulateOutput> AggregateTransform1<TInputAccumulateOutput>(this in DownstreamFactory<TInputAccumulateOutput> downstreamFactory, string workerName, Action<TransformAggregation<TInputAccumulateOutput>> seedAction, Action<TransformAggregation<TInputAccumulateOutput>> accumulationAction, Func<TransformAggregation<TInputAccumulateOutput>, TInputAccumulateOutput> outputFunc, IEqualityComparer<TInputAccumulateOutput> groupByEqualityComparer)
where TInputAccumulateOutput : class
Parameters
| Type | Name | Description |
|---|---|---|
| DownstreamFactory<TInputAccumulateOutput> | downstreamFactory | The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to. Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>). |
| String | workerName | Name of the worker.
Set to a prefix plus a trailing
While less useful, set to
The name cannot otherwise contain |
| Action<TransformAggregation<TInputAccumulateOutput>> | seedAction | The seed action, which will be invoked once for the first (if any) incoming row,
per grouping (if any). Can be It takes the TransformAggregation<TInputAccumulate> aggregation instance as a parameter, with the first row assigned to NextRow, which this action can use to populate Accumulation.
Set Status
to a completed status to discard any additional input rows - |
| Action<TransformAggregation<TInputAccumulateOutput>> | accumulationAction | The accumulation action, which will be invoked once for each incoming row, if there are any.
Can be It takes the TransformAggregation<TInputAccumulate> aggregation instance as a parameter, with the next incoming row assigned to NextRow, which this action uses to update the Accumulation.
Set Status to a completed status
to discard any additional input rows - |
| Func<TransformAggregation<TInputAccumulateOutput>, TInputAccumulateOutput> | outputFunc | The output function, which must return the output row, or Without grouping, it will be invoked once, after all incoming rows have been processed, including if there are no incoming rows. With grouping, it will be invoked once for each grouping, after all incoming rows have been processed. It will not be invoked if there are no incoming rows.
It takes the TransformAggregation<TInputAccumulate>
aggregation instance as a parameter.
Count
holds the number of rows accumulated, and
NextRow
holds the last incoming row (if any).
Accumulation
holds the final accumulation, which is normally used to create and return the output row. Note that
Accumulation and
NextRow
will be Set Status to a failure status to fail the worker. |
| IEqualityComparer<TInputAccumulateOutput> | groupByEqualityComparer | An instance that compares incoming rows. Rows that compare equal will be in the same grouping. Typically Create<T>(Action<IGroupByCommand>) is used to create the comparer, but Create<T, TKey>(Func<T, TKey>) or a custom one can also be used. Set to |
Returns
| Type | Description |
|---|---|
| AggregateTransform<TInputAccumulateOutput> | The newly created and (optionally) linked worker. |
Type Parameters
| Name | Description |
|---|---|
| TInputAccumulateOutput | The type of each |
Exceptions
| Type | Condition |
|---|---|
| ArgumentException | One of |
| ArgumentException |
|
| ArgumentNullException |
|
| InvalidOperationException |
|
AggregateTransform2<TInput, TOutput>(in DownstreamFactory<TInput>, String, Action<IAggregationCommand>)
Initializes a new instance of the
AggregateTransform<TInput, TAccumulateOutput>
dataflow worker, which aggregates incoming rows, and outputs exactly one row with one or more
columns populated by aggregation functions Average, Count, CountDistinct,
CountRows, First, Last, Max, Min and Sum,
see IAggregationCommand for details. Also see
AggregateTransform2<TInput, TOutput>(in DownstreamFactory<TInput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>)
which includes grouping, as well as the examples in
Dataflow Aggregations.
The input and output rows can have a different types.
The Input port is linked to (if available) the upstream
output or error output port specified by the factory.
Note that all the type parameters must be specified with this overload.
Declaration
public static AggregateTransform<TInput, TOutput> AggregateTransform2<TInput, TOutput>(this in DownstreamFactory<TInput> downstreamFactory, string workerName, Action<IAggregationCommand> aggregationCommandAction)
where TInput : class where TOutput : class, new()
Parameters
| Type | Name | Description |
|---|---|---|
| DownstreamFactory<TInput> | downstreamFactory | The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to. Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>). |
| String | workerName | Name of the worker.
Set to a prefix plus a trailing
While less useful, set to
The name cannot otherwise contain |
| Action<IAggregationCommand> | aggregationCommandAction | Commands for specifying predefined aggregation functions (
See IAggregationCommand for details. Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match. |
Returns
| Type | Description |
|---|---|
| AggregateTransform<TInput, TOutput> | The newly created and (optionally) linked worker. |
Type Parameters
| Name | Description |
|---|---|
| TInput | The type of the incoming rows. |
| TOutput | The type of the output row. Must be a concrete class type with a public parameterless constructor. |
Remarks
Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.
Also consider whether to use non-nullable (e.g. int) or nullable
(e.g. int?) .Net types, or the corresponding database provider type if available
(e.g. SqlInt32).
The aggregation functions follow the SQL-92 standard:
- Aggregation without grouping will always produce one output row, including for an empty input set
-
On an empty input set,
Countaggregations return0, whereas the other aggregations returnnull - With grouping, an empty input set will not produce any output rows
-
Countaggregations support nullable input column types. Non-Countaggregations support nullable input column value types if both input and output columns are nullable. -
nullinput column values are not included in aggregations (CountRowshowever doesn't look at input column values at all) -
If the aggregation result type (usually the same as the input column type), can be implicitly cast to the
output column type, then they can have different types, e.g.
longtodouble -
SumandAverageadditions are checked for overflow,Countadditions are not
Exceptions
| Type | Condition |
|---|---|
| ArgumentException |
|
| ArgumentNullException |
|
| InvalidOperationException |
|
See Also
AggregateTransform2<TInput, TOutput>(in DownstreamFactory<TInput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>)
Initializes a new instance of the
AggregateTransform<TInput, TAccumulateOutput>
dataflow worker, which aggregates columns and groups incoming rows, and outputs exactly one row
per unique grouping (if there are any input rows) with one or more columns populated by
aggregation functions (Average, Count, CountDistinct,
CountRows, First, Last, Max, Min and Sum,
see IAggregationCommand for details). Also see
AggregateTransform2<TInput, TOutput>(in DownstreamFactory<TInput>, String, Action<IAggregationCommand>, Action<IGroupByCopyCommand>)
which includes grouping, as well as the examples in
Dataflow Aggregations.
The input and output rows can have a different types.
The Input port is linked to (if available) the upstream
output or error output port specified by the factory.
Note that all the type parameters must be specified with this overload.
Declaration
public static AggregateTransform<TInput, TOutput> AggregateTransform2<TInput, TOutput>(this in DownstreamFactory<TInput> downstreamFactory, string workerName, Action<IAggregationCommand> aggregationCommandAction, Action<IGroupByCopyCommand> groupByCopyCommandAction)
where TInput : class where TOutput : class, new()
Parameters
| Type | Name | Description |
|---|---|---|
| DownstreamFactory<TInput> | downstreamFactory | The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to. Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>). |
| String | workerName | Name of the worker.
Set to a prefix plus a trailing
While less useful, set to
The name cannot otherwise contain |
| Action<IAggregationCommand> | aggregationCommandAction | Commands for specifying predefined aggregation functions (
See IAggregationCommand for details. Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match. |
| Action<IGroupByCopyCommand> | groupByCopyCommandAction | Commands to specify grouping columns in the incoming rows. E.g. to group by
Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match. Set to |
Returns
| Type | Description |
|---|---|
| AggregateTransform<TInput, TOutput> | The newly created and (optionally) linked worker. |
Type Parameters
| Name | Description |
|---|---|
| TInput | The type of the incoming rows. |
| TOutput | The type of the output rows. Must be a concrete class type with a public parameterless constructor. |
Remarks
Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.
Also consider whether to use non-nullable (e.g. int) or nullable
(e.g. int?) .Net types, or the corresponding database provider type if available
(e.g. SqlInt32).
The aggregation functions follow the SQL-92 standard:
- Aggregation without grouping will always produce one output row, including for an empty input set
-
On an empty input set,
Countaggregations return0, whereas the other aggregations returnnull - With grouping, an empty input set will not produce any output rows
-
Countaggregations support nullable input column types. Non-Countaggregations support nullable input column value types if both input and output columns are nullable. -
nullinput column values are not included in aggregations (CountRowshowever doesn't look at input column values at all) -
If the aggregation result type (usually the same as the input column type), can be implicitly cast to the
output column type, then they can have different types, e.g.
longtodouble -
SumandAverageadditions are checked for overflow,Countadditions are not
Exceptions
| Type | Condition |
|---|---|
| ArgumentException |
|
| ArgumentNullException |
|
| InvalidOperationException |
|
See Also
AggregateTransform2<TInput, TAccumulateOutput>(in DownstreamFactory<TInput>, String, Action<TransformAggregation<TInput, TAccumulateOutput>>, Action<TransformAggregation<TInput, TAccumulateOutput>>, Func<TransformAggregation<TInput, TAccumulateOutput>, TAccumulateOutput>, IEqualityComparer<TInput>)
Initializes a new instance of the AggregateTransform<TInput, TAccumulateOutput> dataflow worker, which uses custom callbacks to aggregate and optionally group incoming rows, and output at most one row for all input rows or per unique grouping. Use this overload when the pre-defined aggregation functions in other overloads are not sufficient. The accumulation, and the output rows (if any) both have the same type, but input rows can have a different type.
Note that there are no default seed or accumulation values - to produce output rows,
the seedAction or accumulationAction must set
Accumulation,
or outputFunc must return a row.
The Input port is linked to (if available) the upstream
output or error output port specified by the factory.
Note that all the type parameters must be specified with this overload.
Note that all dataflow workers must adhere to the Row Ownership rules.
Declaration
public static AggregateTransform<TInput, TAccumulateOutput> AggregateTransform2<TInput, TAccumulateOutput>(this in DownstreamFactory<TInput> downstreamFactory, string workerName, Action<TransformAggregation<TInput, TAccumulateOutput>> seedAction, Action<TransformAggregation<TInput, TAccumulateOutput>> accumulationAction, Func<TransformAggregation<TInput, TAccumulateOutput>, TAccumulateOutput> outputFunc, IEqualityComparer<TInput> groupByEqualityComparer)
where TInput : class where TAccumulateOutput : class
Parameters
| Type | Name | Description |
|---|---|---|
| DownstreamFactory<TInput> | downstreamFactory | The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to. Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>). |
| String | workerName | Name of the worker.
Set to a prefix plus a trailing
While less useful, set to
The name cannot otherwise contain |
| Action<TransformAggregation<TInput, TAccumulateOutput>> | seedAction | The seed action, which will be invoked once for the first (if any) incoming row,
per grouping (if any). Can be It takes the TransformAggregation<TInput, TAccumulate> aggregation instance as a parameter, with the first row assigned to NextRow, which this action can use to populate Accumulation.
Set Status
to a completed status to discard any additional input rows - |
| Action<TransformAggregation<TInput, TAccumulateOutput>> | accumulationAction | The accumulation action, which will be invoked once for each incoming row, if there are any.
Can be It takes the TransformAggregation<TInput, TAccumulate> aggregation instance as a parameter, with the next incoming row assigned to NextRow, which this action uses to update the Accumulation.
Set Status to a completed status
to discard any additional input rows - |
| Func<TransformAggregation<TInput, TAccumulateOutput>, TAccumulateOutput> | outputFunc | The output function, which must return the output row, or Without grouping, it will be invoked once, after all incoming rows have been processed, including if there are no incoming rows. With grouping, it will be invoked once for each grouping, after all incoming rows have been processed. It will not be invoked if there are no incoming rows.
It takes the TransformAggregation<TInput, TAccumulate>
aggregation instance as a parameter.
Count
holds the number of rows accumulated, and
NextRow
holds the last incoming row (if any).
Accumulation
holds the final accumulation, which is normally used to create and return the output row. Note that
Accumulation and
NextRow
will be Set Status to a failure status to fail the worker. |
| IEqualityComparer<TInput> | groupByEqualityComparer | An instance that compares incoming rows. Rows that compare equal will be in the same grouping. Typically Create<T>(Action<IGroupByCommand>) is used to create the comparer, but Create<T, TKey>(Func<T, TKey>) or a custom one can also be used. Set to |
Returns
| Type | Description |
|---|---|
| AggregateTransform<TInput, TAccumulateOutput> | The newly created and (optionally) linked worker. |
Type Parameters
| Name | Description |
|---|---|
| TInput | The type of the incoming rows. |
| TAccumulateOutput | The type of the accumulation seed and the |
Remarks
Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.
Furthermore, consider how to treat any null values. In SQL-92 for instance,
COUNT (column name) and COUNT (DISTINCT column name) functions ignore nulls,
but COUNT (*) includes rows with null values. Consider whether to use non-nullable
(e.g. int) or nullable (e.g. int?) .Net types, or the corresponding database
provider type if available (e.g. SqlInt32).
Exceptions
| Type | Condition |
|---|---|
| ArgumentException | One of |
| ArgumentException |
|
| ArgumentNullException |
|
| InvalidOperationException |
|
AggregateTransform3<TInput, TAccumulate, TOutput>(in DownstreamFactory<TInput>, String, Action<TransformAggregation<TInput, TAccumulate>>, Action<TransformAggregation<TInput, TAccumulate>>, Func<TransformAggregation<TInput, TAccumulate>, TOutput>, IEqualityComparer<TInput>)
Initializes a new instance of the AggregateTransform<TInput, TAccumulate, TOutput> dataflow worker, which uses custom callbacks to aggregate and optionally group incoming rows, and output at most one row for all input rows or per unique grouping. Use this overload when the pre-defined aggregation functions in other overloads are not sufficient. The input rows, the accumulation, and the output row (if any) can all have a different types.
Note that there are no default seed or accumulation values, which can instead be set with
seedAction and accumulationAction.
The Input port is linked to (if available) the upstream
output or error output port specified by the factory.
Note that the type parameters must normally be specified with this overload.
Note that all dataflow workers must adhere to the Row Ownership rules.
Declaration
public static AggregateTransform<TInput, TAccumulate, TOutput> AggregateTransform3<TInput, TAccumulate, TOutput>(this in DownstreamFactory<TInput> downstreamFactory, string workerName, Action<TransformAggregation<TInput, TAccumulate>> seedAction, Action<TransformAggregation<TInput, TAccumulate>> accumulationAction, Func<TransformAggregation<TInput, TAccumulate>, TOutput> outputFunc, IEqualityComparer<TInput> groupByEqualityComparer)
where TInput : class where TOutput : class
Parameters
| Type | Name | Description |
|---|---|---|
| DownstreamFactory<TInput> | downstreamFactory | The downstream factory, which specifies the parent worker and (optionally) the upstream port to link the "first" input port of this dataflow worker to. Get it from Link when the upstream port is known ahead of time (which is usually the case). Otherwise get it from GetDownstreamFactory<TInput>(), and link the transform or target explicitly using LinkTo(InputPort<TOutput>) or LinkFrom(OutputPortBase<TInput>). |
| String | workerName | Name of the worker.
Set to a prefix plus a trailing
While less useful, set to
The name cannot otherwise contain |
| Action<TransformAggregation<TInput, TAccumulate>> | seedAction | The seed action, which will be invoked once for the first (if any) incoming row,
per grouping (if any). Can be It takes the TransformAggregation<TInput, TAccumulate> aggregation instance as a parameter, with the first row assigned to NextRow, which this action can use to populate Accumulation.
Set Status
to a completed status to discard any additional input rows - |
| Action<TransformAggregation<TInput, TAccumulate>> | accumulationAction | The accumulation action, which will be invoked once for each incoming row, if there are any.
Can be It takes the TransformAggregation<TInput, TAccumulate> aggregation instance as a parameter, with the next incoming row assigned to NextRow, which this action uses to update the Accumulation.
Set Status to a completed status
to discard any additional input rows - |
| Func<TransformAggregation<TInput, TAccumulate>, TOutput> | outputFunc | The output function, which must return the output row, or Without grouping, it will be invoked once, after all incoming rows have been processed, including if there are no incoming rows. With grouping, it will be invoked once for each grouping, after all incoming rows have been processed. It will not be invoked if there are no incoming rows.
It takes the TransformAggregation<TInput, TAccumulate>
aggregation instance as a parameter.
Count
holds the number of rows accumulated, and
NextRow
holds the last incoming row (if any).
Accumulation
holds the final accumulation, which is normally used to create and return the output row. Note that
Accumulation and
NextRow
will be Set Status to a failure status to fail the worker. |
| IEqualityComparer<TInput> | groupByEqualityComparer | An instance that compares incoming rows. Rows that compare equal will be in the same grouping. Typically Create<T>(Action<IGroupByCommand>) is used to create the comparer, but Create<T, TKey>(Func<T, TKey>) or a custom one can also be used. Set to |
Returns
| Type | Description |
|---|---|
| AggregateTransform<TInput, TAccumulate, TOutput> | The newly created and (optionally) linked worker. |
Type Parameters
| Name | Description |
|---|---|
| TInput | The type of the input rows. |
| TAccumulate | The type of the Accumulation. Can be a value type or a reference type. |
| TOutput | The type of the output row. |
Remarks
Note: Ensure that accumulation column data types can hold all the intermediate as well as the final accumulated values, without overflows occurring.
Furthermore, consider how to treat any null values. In SQL-92 for instance,
COUNT (column name) and COUNT (DISTINCT column name) functions ignore nulls,
but COUNT (*) includes rows with null values. Consider whether to use non-nullable
(e.g. int) or nullable (e.g. int?) .Net types, or the corresponding database
provider type if available (e.g. SqlInt32).
Exceptions
| Type | Condition |
|---|---|
| ArgumentException | One of |
| InvalidOperationException | Either use an output row type that is assignable from the accumulation type, or use an output function. |
| ArgumentException |
|
| ArgumentNullException |
|
| InvalidOperationException |
|