Search Results for

    Show / Hide Table of Contents

    Class DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>

    A dataflow worker that first loads an IDictionary<TKey,TValue> from the DictionaryInput port rows, then performs a lookup in the dictionary for each Input row, optionally modifying the input rows, before sending them to the appropriate output port: FoundOutput, NotFoundOutput, or ErrorOutput. All ports except DictionaryInput have the same row type.

    Note: Use the factory methods in DictionaryLookupSplitTransformFactory to create instances of this class.

    To customize the key lookup, e.g. to make a case insensitive lookup, either add code to the selectRowKeyFunc callback to process the row data to match the case of the lookup reference keys, or create and set the underlying Dictionary as a case insensitive one (see e.g. Dictionary<TKey,TValue>(IEqualityComparer<TKey>)).

    The dictionary is populated from DictionaryInput ahead of processing Input rows, and is therefore by default a fully cached lookup, which is both the most common configuration, and the easiest to configure.

    It is however also possible to implement a partially cached lookup by only loading commonly used dictionary items in bulk via DictionaryInput, and then add missing dictionary items on the fly in the notFoundKeyFunc callback. This avoids loading dictionary items that will never be used, which can be advantageous when it is impractical to retrieve all keys and lookup values ahead of time. In this scenario, create and pass the dictionary to the worker, and use this original reference directly in the notFoundKeyFunc callback.

    Note that multiple rows can often match the same lookup key and value. To avoid issues where modifying one row inadvertently also changes another row, best practice is to make the lookup value only consist of value types and/or immutable types. If the lookup value is, or contains, a mutable reference type, the user must ensure that either there are no lookup value references that are shared and modified across rows, or that the lookup value is cloned, so that each row gets its own unique instance.

    Note that while the DictionaryInput rows can be of any type; consider using the MutableKeyValue<TKey, TValue> helper class when you only need a mutable key and value, or KeyValuePair<TKey,TValue> when an immutable pair is sufficient.

    Also see DictionaryLookupSplitTransform<TInputOutputError, TKey, TValue>, which doesn't have the DictionaryInput port, and DictionaryLookupTransform<TInputOutputError, TDictionaryInput, TKey, TValue>, which sends both rows where the key is found, and rows where the key is not found, to the same output port.

    Also see Dataflow Lookups.

    Inheritance
    Object
    WorkerParent
    WorkerBase
    WorkerBase<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>>
    DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>
    Implements
    IDisposeOnFinished
    Inherited Members
    WorkerBase<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>>.AddCompletedCallback(Func<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>, OutcomeStatus, Task<OutcomeStatus>>)
    WorkerBase<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>>.AddRanCallback(Func<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>, OutcomeStatus, WorkerParentChildrenState, Task<OutcomeStatus>>)
    WorkerBase<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>>.AddStartingCallback(Func<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>, Task<ProgressStatus>>)
    WorkerBase.AddCompletedCallback(Func<WorkerBase, OutcomeStatus, Task<OutcomeStatus>>)
    WorkerBase.AddRanCallback(Func<WorkerBase, OutcomeStatus, WorkerParentChildrenState, Task<OutcomeStatus>>)
    WorkerBase.AddStartingCallback(Func<WorkerBase, Task<ProgressStatus>>)
    WorkerBase.DefaultIsStartable()
    WorkerBase.ErroredPortErrorsWorkerProtected
    WorkerBase.ErrorOutputs
    WorkerBase.EscalateError
    WorkerBase.Inputs
    WorkerBase.IsStartable
    WorkerBase.Outputs
    WorkerBase.Parent
    WorkerBase.SucceededSequence<TLastWorker>(WorkerBase, WorkerBase, WorkerBase, WorkerBase, WorkerBase, TLastWorker)
    WorkerBase.SucceededSequence<TLastWorker>(WorkerBase, WorkerBase, WorkerBase, WorkerBase, TLastWorker)
    WorkerBase.SucceededSequence<TLastWorker>(WorkerBase, WorkerBase, WorkerBase, TLastWorker)
    WorkerBase.SucceededSequence<TLastWorker>(WorkerBase, WorkerBase, TLastWorker)
    WorkerBase.SucceededSequence<TLastWorker>(WorkerBase, TLastWorker)
    WorkerBase.SucceededSequence<TLastWorker>(TLastWorker)
    WorkerParent.AddChildCompletedCallback(Action<WorkerBase>)
    WorkerParent.AddStartingChildrenCallback(Func<WorkerParent, Task<ProgressStatus>>)
    WorkerParent.BytesPerRowBuffer
    WorkerParent.Children
    WorkerParent.DisposeOnFinished<TDisposable>(TDisposable)
    WorkerParent.GetDownstreamFactory<TInput>()
    WorkerParent.HasChildren
    WorkerParent.IsCanceled
    WorkerParent.IsCompleted
    WorkerParent.IsCreated
    WorkerParent.IsError
    WorkerParent.IsFailed
    WorkerParent.IsFatal
    WorkerParent.IsRunning
    WorkerParent.IsSucceeded
    WorkerParent.KeepChildrenLevels
    WorkerParent.Locator
    WorkerParent.LogFactory
    WorkerParent.Logger
    WorkerParent.MaxRunningChildren
    WorkerParent.Name
    WorkerParent.RemoveChildren()
    WorkerParent.RescheduleChildren()
    WorkerParent.RunChildrenAsync(Boolean)
    WorkerParent.RunChildrenAsync()
    WorkerParent.Status
    WorkerParent.Item[String]
    WorkerParent.ToString()
    WorkerParent.WorkerSystem
    WorkerParent.DebugCommands
    WorkerParent.AggregateErrorOutputRows
    WorkerParent.AggregateOutputRows
    WorkerParent.AggregateWorkersCompleted
    WorkerParent.InstantCompleted
    WorkerParent.InstantCreated
    WorkerParent.InstantStarted
    WorkerParent.RunningDuration
    Namespace: actionETL
    Assembly: actionETL.dll
    Syntax
    public class DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue> : WorkerBase<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>>, IDisposeOnFinished where TInputOutputError : class where TDictionaryInput : class
    Type Parameters
    Name Description
    TInputOutputError

    The type of all input and output port rows, except for DictionaryInput.

    TDictionaryInput

    The type of DictionaryInput port rows.

    TKey

    The type of the lookup key.

    TValue

    The type of the lookup value.

    Remarks

    The concrete dictionary used by the transform is often a Dictionary<TKey,TValue> (either created by default by the worker, or supplied by the user). Note that it can be shared by other workers if it's not modified after the other workers start to run.

    If the Dictionary<TKey,TValue> is only used by a single worker at a time, that worker can modify its dictionary while it runs. This can be used to e.g. build up the dictionary on the fly: In the notFoundKeyFunc callback, attempt to add a new dictionary entry for the missing key, and if successful, invoke the foundKeyFunc or foundKeyAction callback on the row and return Found; otherwise, return NotFound.

    The dictionary can of course be any IDictionary<TKey,TValue> implementation. E.g.:

    • Set initial capacity (to reduce re-allocations) and/or a custom equality comparer (e.g. case insensitive) using an appropriate Dictionary<TKey,TValue> constructor
    • Use ConcurrentDictionary<TKey,TValue> to allow multiple threads (i.e. workers) to use and modify the dictionary simultaneously. This can save memory by sharing a single mutable dictionary copy; note however that it can be an order of magnitude slower than a non-concurrent dictionary.
    • Use a caching dictionary that discards seldom used items, thereby limiting its memory use
    • Create an IReadOnlyDictionary<TKey,TValue> wrapper around an out of process web lookup service

    Properties

    Dictionary

    Gets or sets the dictionary to populate with DictionaryInput rows, and lookup incoming rows against. If null when the worker runs, a Dictionary<TKey,TValue> will be automatically created.

    The property can only be accessed when the worker is not running. To populate the dictionary on the fly, create and pass the dictionary to the worker, and use this original reference directly in the notFoundKeyFunc callback.

    If the number of dictionary items is known to be large, consider creating the dictionary explicitly, with a large initial capacity to reduce the number of reallocations of the dictionary storage.

    Note that the dictionary can optionally be set to null after the worker has completed, to allow it to be garbage collected before the worker itself is garbage collected, which in rare circumstances can be useful with a large collection.

    Declaration
    public IDictionary<TKey, TValue> Dictionary { get; set; }
    Property Value
    Type Description
    IDictionary<TKey, TValue>

    The dictionary.

    Exceptions
    Type Condition
    InvalidOperationException

    Cannot set the dictionary while the worker is running.

    DictionaryAddKeyTreatment

    Gets or sets how to handle any duplicate keys when adding items to a dictionary.

    The default is FatalOnDuplicate.

    Declaration
    public DictionaryAddKeyTreatment DictionaryAddKeyTreatment { get; set; }
    Property Value
    Type Description
    DictionaryAddKeyTreatment

    DictionaryInput

    Gets the dictionary input port for receiving rows from an upstream worker to populate the dictionary.

    Declaration
    public InputPort<TDictionaryInput> DictionaryInput { get; }
    Property Value
    Type Description
    InputPort<TDictionaryInput>

    ErrorOutput

    Gets the error output port for sending error rows to logging and an optional downstream worker.

    It will receive the first row that throws an exception, and rejected rows.

    Declaration
    public ErrorOutputPort<TInputOutputError> ErrorOutput { get; }
    Property Value
    Type Description
    ErrorOutputPort<TInputOutputError>

    FoundOutput

    Gets the output port for sending rows where the key is found to the downstream worker.

    Declaration
    public OutputPort<TInputOutputError> FoundOutput { get; }
    Property Value
    Type Description
    OutputPort<TInputOutputError>

    Input

    Gets the input port for receiving rows from an upstream worker.

    Declaration
    public InputPort<TInputOutputError> Input { get; }
    Property Value
    Type Description
    InputPort<TInputOutputError>

    NotFoundOutput

    Gets the output port for sending rows where the key is not found to the downstream worker.

    Declaration
    public OutputPort<TInputOutputError> NotFoundOutput { get; }
    Property Value
    Type Description
    OutputPort<TInputOutputError>

    Methods

    RunAsync()

    This method can be overridden to add custom functionality to the derived worker that runs before and after the row processing. In this case, the base class base.RunAsync() must be called for the worker to function correctly.

    Typically, this worker is used without overriding this method.

    Declaration
    protected override async Task<OutcomeStatus> RunAsync()
    Returns
    Type Description
    Task<OutcomeStatus>

    A Task describing the success or failure of the worker. An asynchronous async implementation would e.g. return OutcomeStatus.Succeeded on success, while a synchronous implementation would return OutcomeStatus.SucceededTask.

    Overrides
    WorkerParent.RunAsync()

    Implements

    IDisposeOnFinished

    See Also

    DictionaryLookupSplitTransformFactory
    MutableKeyValue<TKey, TValue>
    KeyValuePair<TKey,TValue>
    DictionaryLookupSplitTransform<TInputOutputError, TKey, TValue>
    DictionaryLookupRowTreatment
    DictionaryLookupTransform<TInputOutputError, TDictionaryInput, TKey, TValue>
    DictionaryLookupTransform<TInputOutputError, TKey, TValue>
    DictionaryTarget<TInput, TKey, TValue>
    In This Article
    Back to top Copyright © 2023 Envobi Ltd