Class DictionaryLookupTransform<TInputOutputError, TKey, TValue>
A dataflow worker that performs a lookup in an IReadOnlyDictionary<TKey,TValue> collection for each Input row, optionally modifying the input rows, before sending them to the output or error port. All ports have the same row type.
Note: Use the factory methods in DictionaryLookupTransformFactory to create instances of this class.
To customize the key lookup, e.g. to make a case insensitive lookup, either add code to the
selectRowKeyFunc
callback to process the row data to match the case of the lookup
reference keys, or create and set the underlying Dictionary as a case insensitive
one (see e.g. Dictionary<TKey,TValue>(IEqualityComparer<TKey>)).
By supplying a pre-populated dictionary, the worker by default performs a fully cached lookup, which is both the most common configuration, and the easiest to configure.
It is however also possible to implement a partially cached lookup by starting with an empty dictionary,
or a partially pre-populated dictionary, and then add missing dictionary items on the fly
in the notFoundKeyFunc
callback. This avoids loading dictionary items that will
never be used, which can be advantageous when it is impractical to retrieve all keys
and lookup values ahead of time.
Note that multiple rows can often match the same lookup key and value. To avoid issues where modifying one row inadvertently also changes another row, best practice is to make the lookup value only consist of value types and/or immutable types. If the lookup value is, or contains, a mutable reference type, the user must ensure that either there are no lookup value references that are shared and modified across rows, or that the lookup value is cloned, so that each row gets its own unique instance.
Also see DictionaryLookupTransform<TInputOutputError, TDictionaryInput, TKey, TValue>, which loads the dictionary from a second input port, and DictionaryLookupSplitTransform<TInputOutputError, TKey, TValue>, which has a separate output port for unmatched rows.
Also see Dataflow Lookups.
Inheritance
Implements
Inherited Members
Namespace: actionETL
Assembly: actionETL.dll
Syntax
public class DictionaryLookupTransform<TInputOutputError, TKey, TValue> : WorkerBase<DictionaryLookupTransform<TInputOutputError, TKey, TValue>>, IDisposeOnFinished where TInputOutputError : class
Type Parameters
Name | Description |
---|---|
TInputOutputError | The type of all input and output port rows. |
TKey | The type of the lookup key. |
TValue | The type of the lookup value. |
Remarks
The concrete dictionary used by the transform is often a Dictionary<TKey,TValue>, which must be provided either to the worker constructor or via Dictionary. Note that it can be shared by other workers if it's not modified after the other workers start to run.
If the Dictionary<TKey,TValue> is only used by a single worker at
a time, the worker can modify its dictionary while it runs. This can be used to e.g.
build up the dictionary on the fly: In the notFoundKeyFunc
callback, attempt to add a new dictionary
entry for the missing key, and if successful, invoke the foundKeyFunc
or foundKeyAction
callback on the row and return Found; otherwise, return
NotFound.
The dictionary can of course be any IReadOnlyDictionary<TKey,TValue> implementation. E.g.:
- Set initial capacity (to reduce re-allocations) and/or a custom equality comparer (e.g. case insensitive) using an appropriate Dictionary<TKey,TValue> constructor
- Use ConcurrentDictionary<TKey,TValue> to allow multiple threads (i.e. workers) to use and modify the dictionary simultaneously. This can save memory by sharing a single mutable dictionary copy; note however that it can be an order of magnitude slower than a non-concurrent dictionary.
- Use a caching dictionary that discards seldom used items, thereby limiting its memory use
- Create an IReadOnlyDictionary<TKey,TValue> wrapper around an out of process web lookup service
Properties
Dictionary
Gets or sets the dictionary to lookup incoming rows against. The property can only be accessed when the worker is not running.
A dictionary must be provided either here, or via the worker constructor.
The callback methods (normally notFoundKeyFunc
) can optionally be used to populate
the dictionary on the fly.
Note: To modify the dictionary in the callback methods, use the original dictionary instance, since this property has a read-only type, and also adds synchronization overhead.
Note that the dictionary can optionally be set to null
after the worker
has completed, to allow it to be garbage collected before the worker itself is garbage collected,
which in rare circumstances can be useful with a large collection.
Note: This property is thread-safe.
Declaration
public IReadOnlyDictionary<TKey, TValue> Dictionary { get; set; }
Property Value
Type | Description |
---|---|
IReadOnlyDictionary<TKey, TValue> | The dictionary. |
Exceptions
Type | Condition |
---|---|
InvalidOperationException | Cannot set the dictionary while the worker is running. |
ErrorOutput
Gets the error output port for sending error rows to logging and an optional downstream worker.
It will receive the first row that throws an exception, rows where the lookup key was not found, and rejected rows.
Declaration
public ErrorOutputPort<TInputOutputError> ErrorOutput { get; }
Property Value
Type | Description |
---|---|
ErrorOutputPort<TInputOutputError> |
Input
Gets the input port for receiving rows from an upstream worker.
Declaration
public InputPort<TInputOutputError> Input { get; }
Property Value
Type | Description |
---|---|
InputPort<TInputOutputError> |
Output
Gets the output port for sending rows, where the key is found, and (optionally) where it is not found, to the downstream worker.
Declaration
public OutputPort<TInputOutputError> Output { get; }
Property Value
Type | Description |
---|---|
OutputPort<TInputOutputError> |
Methods
RunAsync()
This method can be overridden to add custom functionality to the derived worker that runs before
and after the row processing. In this case, the base class base.RunAsync()
must
be called for the worker to function correctly.
Typically, this worker is used without overriding this method.
Declaration
protected override Task<OutcomeStatus> RunAsync()
Returns
Type | Description |
---|---|
Task<OutcomeStatus> | A |