Class DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>

A dataflow worker that first loads an IDictionary<TKey,TValue> from the DictionaryInput port rows, then performs a lookup in the dictionary for each Input row, optionally modifying the input rows, before sending them to the appropriate output port: FoundOutput, NotFoundOutput, or ErrorOutput. All ports except DictionaryInput have the same row type.

Note: Use the factory methods in DictionaryLookupSplitTransformFactory to create instances of this class.

To customize the key lookup, e.g. to make a case insensitive lookup, either add code to the selectRowKeyFunc callback to process the row data to match the case of the lookup reference keys, or create and set the underlying Dictionary as a case insensitive one (see e.g. Dictionary<TKey,TValue>(IEqualityComparer<TKey>)).

The dictionary is populated from DictionaryInput ahead of processing Input rows, and is therefore by default a fully cached lookup, which is both the most common configuration, and the easiest to configure.

It is however also possible to implement a partially cached lookup by only loading commonly used dictionary items in bulk via DictionaryInput, and then add missing dictionary items on the fly in the notFoundKeyFunc callback. This avoids loading dictionary items that will never be used, which can be advantageous when it is impractical to retrieve all keys and lookup values ahead of time. In this scenario, create and pass the dictionary to the worker, and use this original reference directly in the notFoundKeyFunc callback.

Note that multiple rows can often match the same lookup key and value. To avoid issues where modifying one row inadvertently also changes another row, best practice is to make the lookup value only consist of value types and/or immutable types. If the lookup value is, or contains, a mutable reference type, the user must ensure that either there are no lookup value references that are shared and modified across rows, or that the lookup value is cloned, so that each row gets its own unique instance.

Note that while the DictionaryInput rows can be of any type; consider using the MutableKeyValue<TKey, TValue> helper class when you only need a mutable key and value, or KeyValuePair<TKey,TValue> when an immutable pair is sufficient.

Also see DictionaryLookupSplitTransform<TInputOutputError, TKey, TValue>, which doesn't have the DictionaryInput port, and DictionaryLookupTransform<TInputOutputError, TDictionaryInput, TKey, TValue>, which sends both rows where the key is found, and rows where the key is not found, to the same output port.

Also see Dataflow Lookups.

Inheritance

Object

WorkerParent

WorkerBase

WorkerBase<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>>

DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>

Implements

IDisposeOnFinished

Inherited Members

WorkerBase<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>>.AddCompletedCallback(Func<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>, OutcomeStatus, Task<OutcomeStatus>>)

WorkerBase<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>>.AddRanCallback(Func<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>, OutcomeStatus, WorkerParentChildrenState, Task<OutcomeStatus>>)

WorkerBase<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>>.AddStartingCallback(Func<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>, Task<ProgressStatus>>)

WorkerBase.AddCompletedCallback(Func<WorkerBase, OutcomeStatus, Task<OutcomeStatus>>)

WorkerBase.AddRanCallback(Func<WorkerBase, OutcomeStatus, WorkerParentChildrenState, Task<OutcomeStatus>>)

WorkerBase.AddStartingCallback(Func<WorkerBase, Task<ProgressStatus>>)

WorkerBase.DefaultIsStartable()

WorkerBase.ErroredPortErrorsWorkerProtected

WorkerBase.ErrorOutputs

WorkerBase.EscalateError

WorkerBase.Inputs

WorkerBase.IsStartable

WorkerBase.Outputs

WorkerBase.Parent

WorkerBase.SucceededSequence<TLastWorker>(WorkerBase, WorkerBase, WorkerBase, WorkerBase, WorkerBase, TLastWorker)

WorkerBase.SucceededSequence<TLastWorker>(WorkerBase, WorkerBase, WorkerBase, WorkerBase, TLastWorker)

WorkerBase.SucceededSequence<TLastWorker>(WorkerBase, WorkerBase, WorkerBase, TLastWorker)

WorkerBase.SucceededSequence<TLastWorker>(WorkerBase, WorkerBase, TLastWorker)

WorkerBase.SucceededSequence<TLastWorker>(WorkerBase, TLastWorker)

WorkerBase.SucceededSequence<TLastWorker>(TLastWorker)

WorkerParent.AddChildCompletedCallback(Action<WorkerBase>)

WorkerParent.AddStartingChildrenCallback(Func<WorkerParent, Task<ProgressStatus>>)

WorkerParent.BytesPerRowBuffer

WorkerParent.Children

WorkerParent.DisposeOnFinished<TDisposable>(TDisposable)

WorkerParent.GetDownstreamFactory<TInput>()

WorkerParent.HasChildren

WorkerParent.IsCanceled

WorkerParent.IsCompleted

WorkerParent.IsCreated

WorkerParent.IsError

WorkerParent.IsFailed

WorkerParent.IsFatal

WorkerParent.IsRunning

WorkerParent.IsSucceeded

WorkerParent.KeepChildrenLevels

WorkerParent.Locator

WorkerParent.LogFactory

WorkerParent.Logger

WorkerParent.MaxRunningChildren

WorkerParent.Name

WorkerParent.RemoveChildren()

WorkerParent.RescheduleChildren()

WorkerParent.RunChildrenAsync(Boolean)

WorkerParent.RunChildrenAsync()

WorkerParent.Status

WorkerParent.Item[String]

WorkerParent.ToString()

WorkerParent.WorkerSystem

WorkerParent.DebugCommands

WorkerParent.AggregateErrorOutputRows

WorkerParent.AggregateOutputRows

WorkerParent.AggregateWorkersCompleted

WorkerParent.InstantCompleted

WorkerParent.InstantCreated

WorkerParent.InstantStarted

WorkerParent.RunningDuration

Namespace: actionETL

Assembly: actionETL.dll

Syntax

public class DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue> : WorkerBase<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>>, IDisposeOnFinished where TInputOutputError : class where TDictionaryInput : class

Type Parameters

Name	Description
TInputOutputError	The type of all input and output port rows, except for `DictionaryInput`.
TDictionaryInput	The type of `DictionaryInput` port rows.
TKey	The type of the lookup key.
TValue	The type of the lookup value.

Remarks

The concrete dictionary used by the transform is often a Dictionary<TKey,TValue> (either created by default by the worker, or supplied by the user). Note that it can be shared by other workers if it's not modified after the other workers start to run.

If the Dictionary<TKey,TValue> is only used by a single worker at a time, that worker can modify its dictionary while it runs. This can be used to e.g. build up the dictionary on the fly: In the notFoundKeyFunc callback, attempt to add a new dictionary entry for the missing key, and if successful, invoke the foundKeyFunc or foundKeyAction callback on the row and return Found; otherwise, return NotFound.

The dictionary can of course be any IDictionary<TKey,TValue> implementation. E.g.:

Set initial capacity (to reduce re-allocations) and/or a custom equality comparer (e.g. case insensitive) using an appropriate Dictionary<TKey,TValue> constructor
Use ConcurrentDictionary<TKey,TValue> to allow multiple threads (i.e. workers) to use and modify the dictionary simultaneously. This can save memory by sharing a single mutable dictionary copy; note however that it can be an order of magnitude slower than a non-concurrent dictionary.
Use a caching dictionary that discards seldom used items, thereby limiting its memory use
Create an IReadOnlyDictionary<TKey,TValue> wrapper around an out of process web lookup service

Properties

Dictionary

Gets or sets the dictionary to populate with DictionaryInput rows, and lookup incoming rows against. If null when the worker runs, a Dictionary<TKey,TValue> will be automatically created.

The property can only be accessed when the worker is not running. To populate the dictionary on the fly, create and pass the dictionary to the worker, and use this original reference directly in the notFoundKeyFunc callback.

If the number of dictionary items is known to be large, consider creating the dictionary explicitly, with a large initial capacity to reduce the number of reallocations of the dictionary storage.

Note that the dictionary can optionally be set to null after the worker has completed, to allow it to be garbage collected before the worker itself is garbage collected, which in rare circumstances can be useful with a large collection.

Declaration

public IDictionary<TKey, TValue> Dictionary { get; set; }

Property Value

Type	Description
IDictionary<TKey, TValue>	The dictionary.

Exceptions

Type	Condition
InvalidOperationException	Cannot set the dictionary while the worker is running.

DictionaryAddKeyTreatment

Gets or sets how to handle any duplicate keys when adding items to a dictionary.

The default is FatalOnDuplicate.

Declaration

public DictionaryAddKeyTreatment DictionaryAddKeyTreatment { get; set; }

Property Value

Type	Description
DictionaryAddKeyTreatment

DictionaryInput

Gets the dictionary input port for receiving rows from an upstream worker to populate the dictionary.

Declaration

public InputPort<TDictionaryInput> DictionaryInput { get; }

Property Value

Type	Description
InputPort<TDictionaryInput>

ErrorOutput

Gets the error output port for sending error rows to logging and an optional downstream worker.

It will receive the first row that throws an exception, and rejected rows.

Declaration

public ErrorOutputPort<TInputOutputError> ErrorOutput { get; }

Property Value

Type	Description
ErrorOutputPort<TInputOutputError>

FoundOutput

Gets the output port for sending rows where the key is found to the downstream worker.

Declaration

public OutputPort<TInputOutputError> FoundOutput { get; }

Property Value

Type	Description
OutputPort<TInputOutputError>

Input

Gets the input port for receiving rows from an upstream worker.

Declaration

public InputPort<TInputOutputError> Input { get; }

Property Value

Type	Description
InputPort<TInputOutputError>

NotFoundOutput

Gets the output port for sending rows where the key is not found to the downstream worker.

Declaration

public OutputPort<TInputOutputError> NotFoundOutput { get; }

Property Value

Type	Description
OutputPort<TInputOutputError>

Methods

RunAsync()

This method can be overridden to add custom functionality to the derived worker that runs before and after the row processing. In this case, the base class base.RunAsync() must be called for the worker to function correctly.

Typically, this worker is used without overriding this method.

Declaration

protected override async Task<OutcomeStatus> RunAsync()

Returns

Type	Description
Task<OutcomeStatus>	A `Task` describing the success or failure of the worker. An asynchronous `async` implementation would e.g. return `OutcomeStatus.Succeeded` on success, while a synchronous implementation would return `OutcomeStatus.SucceededTask`.

Overrides

WorkerParent.RunAsync()

Implements

IDisposeOnFinished

Class DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>

Inheritance

Implements

Inherited Members

Namespace: actionETL

Assembly: actionETL.dll

Syntax

Type Parameters

Remarks

Properties

Dictionary

Declaration

Property Value

Exceptions

DictionaryAddKeyTreatment

Declaration

Property Value

DictionaryInput

Declaration

Property Value

ErrorOutput

Declaration

Property Value

FoundOutput

Declaration

Property Value

Input

Declaration

Property Value

NotFoundOutput

Declaration

Property Value

Methods

RunAsync()

Declaration

Returns

Overrides

Implements

See Also