Search Results for

    Show / Hide Table of Contents

    Dataflow Aggregations

    This article describes the predefined aggregation functions and grouping facilities available with the AggregateTransform workers. Also see the AggregateCsvCreateInsertTable sample. Custom dataflow aggregations are instead described in Custom Dataflow Aggregations.

    Note
    • By default, .NET Framework maximum array size is 2GB, which in turn with a 64-bit application limits the number of unique groupings to a maximum of 47.9 million, and the CountDistinct number of distinct values per column to a maximum of just over 89 million. You can remove these limits by enabling support for larger arrays, as described in <gcAllowVeryLargeObjects> Element.
      • .NET 6+ has a similar setting, but already defaults to support >2GB: COMPlus_gcAllowVeryLargeObjects
    • Column name matching is ordinal case insensitive, but a case sensitive match takes precedence over a case insensitive match.

    Predefined Column Aggregations

    The following predefined column aggregations, described in IAggregationCommand, are available: Average, Count, CountDistinct, CountRows, First, Last, Max, Min and Sum.

    Use AggregateTransform<TInputAccumulateOutput> with or without grouping when input and output rows have the same type.

    Use AggregateTransform<TInput, TAccumulateOutput> with or without grouping if types are different. Here's an example that both groups and aggregates rows:

    // using actionETL;
    // using System;
    // using System.Collections.Generic;
    
    private sealed class In
    {
        public int Category;
        public int Metric;
    }
    
    private readonly In[] _sampleData = new In[]
        {
              new In { Category = 1, Metric = 2 }
            , new In { Category = 1, Metric = 2 }
            , new In { Category = 1, Metric = 5 }
            , new In { Category = 2, Metric = 10 }
        };
    
    private sealed class AccOut
    {
        public int Category;
    
        public double Average;
        public long Count;
        public long CountDistinct;
        public long CountRows;
        public int First;
        public int Last;
        public int Max;
        public int Min;
        public double Sum;
    }
    
    public WorkerSystem RunExample()
    {
        var workerSystem = new WorkerSystem()
            .Root(ws =>
            {
                var source = new EnumerableSource<In>(ws, "Source", _sampleData)
    
                .Output.Link.AggregateTransform2<In, AccOut>(
                      "Transform"
                    , ac => ac
                        .Average(nameof(In.Metric), nameof(AccOut.Average))
                        .Count(nameof(In.Metric), nameof(AccOut.Count))
                        .CountDistinct(nameof(In.Metric), nameof(AccOut.CountDistinct))
                        .CountRows(nameof(AccOut.CountRows))
                        .First(nameof(In.Metric), nameof(AccOut.First))
                        .Last(nameof(In.Metric), nameof(AccOut.Last))
                        .Max(nameof(In.Metric), nameof(AccOut.Max))
                        .Min(nameof(In.Metric), nameof(AccOut.Min))
                        .Sum(nameof(In.Metric), nameof(AccOut.Sum))
                    , g => g.Name(nameof(In.Category)) // Group by column [Category]
                    )
    
                // Receives two rows:
                // { Category=1, Average=3.0, Count=3, CountDistinct=2, CountRows=3
                //    , First=2, Last=5, Max=5, Min=2, Sum=9 }
                // { Category=2, Average=10.0, Count=1, CountDistinct=1, CountRows=1
                //    , First=10, Last=10, Max=10, Min=10, Sum=10 }
                .Output.Link.CollectionTarget();
            });
        workerSystem.Start().ThrowOnFailure();
        return workerSystem;
    }
    

    Predefined Row Aggregations

    A number of predefined row aggregations are available, such as First, Last, Single etc., see RowAggregationFunction for details. Create these with IGroupByCommand grouping, with IEqualityComparer grouping, or without grouping.

    Here's an example that selects the last input row in each group:

    // using actionETL;
    // using System;
    // using System.Collections.Generic;
    
    private sealed class Row
    {
        public int Category;
        public int Metric;
    }
    
    private readonly Row[] _sampleData = new Row[]
        {
              new Row { Category = 1, Metric = 2 }
            , new Row { Category = 1, Metric = 2 }
            , new Row { Category = 1, Metric = 5 }
            , new Row { Category = 2, Metric = 10 }
        };
    
    public WorkerSystem RunExample()
    {
        var workerSystem = new WorkerSystem()
            .Root(ws =>
            {
                var source = new EnumerableSource<Row>(ws, "Source", _sampleData)
    
                .Output.Link.AggregateTransform1(
                      "Transform"
                    , RowAggregationFunction.Last
                    , g => g.Name(nameof(Row.Category)) // Group by column [Category]
                    )
    
                // Receives two rows:
                // { Category = 1, Metric = 5 }
                // { Category = 2, Metric = 10 }
                .Output.Link.CollectionTarget();
            });
        workerSystem.Start().ThrowOnFailure();
        return workerSystem;
    }
    

    See Also

    • Dataflow
      • Dataflow Rows
      • Dataflow Columns
      • Dataflow Aggregations
        • Custom Dataflow Aggregations
        • AggregateCsvCreateInsertTable
    In This Article
    Back to top Copyright © 2023 Envobi Ltd