Class DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>
A dataflow worker that first loads an IDictionary<TKey,TValue> from the
DictionaryInput
port rows, then performs a lookup in the dictionary for
each Input row, optionally modifying the input rows, before sending them to
the appropriate output port: FoundOutput, NotFoundOutput,
or ErrorOutput.
All ports except DictionaryInput
have the same row type.
Note: Use the factory methods in DictionaryLookupSplitTransformFactory to create instances of this class.
To customize the key lookup, e.g. to make a case insensitive lookup, either add code to the
selectRowKeyFunc
callback to process the row data to match the case of the lookup
reference keys, or create and set the underlying Dictionary as a case insensitive
one (see e.g. Dictionary<TKey,TValue>(IEqualityComparer<TKey>)).
The dictionary is populated from DictionaryInput
ahead of processing Input
rows,
and is therefore by default a fully cached lookup, which is both the most common configuration,
and the easiest to configure.
It is however also possible to implement a partially cached lookup by only loading commonly used
dictionary items in bulk via DictionaryInput
, and then add missing dictionary items on the fly
in the notFoundKeyFunc
callback. This avoids loading dictionary items that will
never be used, which can be advantageous when it is impractical to retrieve all keys
and lookup values ahead of time.
In this scenario, create and pass the dictionary to the worker, and use
this original reference directly in the notFoundKeyFunc
callback.
Note that multiple rows can often match the same lookup key and value. To avoid issues where modifying one row inadvertently also changes another row, best practice is to make the lookup value only consist of value types and/or immutable types. If the lookup value is, or contains, a mutable reference type, the user must ensure that either there are no lookup value references that are shared and modified across rows, or that the lookup value is cloned, so that each row gets its own unique instance.
Note that while the DictionaryInput
rows can be of any type; consider using the
MutableKeyValue<TKey, TValue> helper class when you only need a
mutable key and value, or KeyValuePair<TKey,TValue>
when an immutable pair is sufficient.
Also see DictionaryLookupSplitTransform<TInputOutputError, TKey, TValue>,
which doesn't have the DictionaryInput
port, and
DictionaryLookupTransform<TInputOutputError, TDictionaryInput, TKey, TValue>,
which sends both rows where the key is found, and rows where the key is not found,
to the same output port.
Also see Dataflow Lookups.
Inheritance
Implements
Inherited Members
Namespace: actionETL
Assembly: actionETL.dll
Syntax
public class DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue> : WorkerBase<DictionaryLookupSplitTransform<TInputOutputError, TDictionaryInput, TKey, TValue>>, IDisposeOnFinished where TInputOutputError : class where TDictionaryInput : class
Type Parameters
Name | Description |
---|---|
TInputOutputError | The type of all input and output port rows, except for |
TDictionaryInput | The type of |
TKey | The type of the lookup key. |
TValue | The type of the lookup value. |
Remarks
The concrete dictionary used by the transform is often a Dictionary<TKey,TValue> (either created by default by the worker, or supplied by the user). Note that it can be shared by other workers if it's not modified after the other workers start to run.
If the Dictionary<TKey,TValue> is only used by a single worker at
a time, that worker can modify its dictionary while it runs. This can be used to e.g.
build up the dictionary on the fly: In the notFoundKeyFunc
callback, attempt to add a new dictionary
entry for the missing key, and if successful, invoke the foundKeyFunc
or foundKeyAction
callback on the row and return Found; otherwise, return
NotFound.
The dictionary can of course be any IDictionary<TKey,TValue> implementation. E.g.:
- Set initial capacity (to reduce re-allocations) and/or a custom equality comparer (e.g. case insensitive) using an appropriate Dictionary<TKey,TValue> constructor
- Use ConcurrentDictionary<TKey,TValue> to allow multiple threads (i.e. workers) to use and modify the dictionary simultaneously. This can save memory by sharing a single mutable dictionary copy; note however that it can be an order of magnitude slower than a non-concurrent dictionary.
- Use a caching dictionary that discards seldom used items, thereby limiting its memory use
- Create an IReadOnlyDictionary<TKey,TValue> wrapper around an out of process web lookup service
Properties
Dictionary
Gets or sets the dictionary to populate with DictionaryInput rows, and
lookup incoming rows against. If null
when the worker runs, a
Dictionary<TKey,TValue> will be automatically created.
The property can only be accessed when the worker is not running.
To populate the dictionary on the fly, create and pass the dictionary to the worker, and use
this original reference directly in the notFoundKeyFunc
callback.
If the number of dictionary items is known to be large, consider creating the dictionary explicitly, with a large initial capacity to reduce the number of reallocations of the dictionary storage.
Note that the dictionary can optionally be set to null
after the worker
has completed, to allow it to be garbage collected before the worker itself is garbage collected,
which in rare circumstances can be useful with a large collection.
Declaration
public IDictionary<TKey, TValue> Dictionary { get; set; }
Property Value
Type | Description |
---|---|
IDictionary<TKey, TValue> | The dictionary. |
Exceptions
Type | Condition |
---|---|
InvalidOperationException | Cannot set the dictionary while the worker is running. |
DictionaryAddKeyTreatment
Gets or sets how to handle any duplicate keys when adding items to a dictionary.
The default is FatalOnDuplicate.
Declaration
public DictionaryAddKeyTreatment DictionaryAddKeyTreatment { get; set; }
Property Value
Type | Description |
---|---|
DictionaryAddKeyTreatment |
DictionaryInput
Gets the dictionary input port for receiving rows from an upstream worker to populate the dictionary.
Declaration
public InputPort<TDictionaryInput> DictionaryInput { get; }
Property Value
Type | Description |
---|---|
InputPort<TDictionaryInput> |
ErrorOutput
Gets the error output port for sending error rows to logging and an optional downstream worker.
It will receive the first row that throws an exception, and rejected rows.
Declaration
public ErrorOutputPort<TInputOutputError> ErrorOutput { get; }
Property Value
Type | Description |
---|---|
ErrorOutputPort<TInputOutputError> |
FoundOutput
Gets the output port for sending rows where the key is found to the downstream worker.
Declaration
public OutputPort<TInputOutputError> FoundOutput { get; }
Property Value
Type | Description |
---|---|
OutputPort<TInputOutputError> |
Input
Gets the input port for receiving rows from an upstream worker.
Declaration
public InputPort<TInputOutputError> Input { get; }
Property Value
Type | Description |
---|---|
InputPort<TInputOutputError> |
NotFoundOutput
Gets the output port for sending rows where the key is not found to the downstream worker.
Declaration
public OutputPort<TInputOutputError> NotFoundOutput { get; }
Property Value
Type | Description |
---|---|
OutputPort<TInputOutputError> |
Methods
RunAsync()
This method can be overridden to add custom functionality to the derived worker that runs before
and after the row processing. In this case, the base class base.RunAsync()
must
be called for the worker to function correctly.
Typically, this worker is used without overriding this method.
Declaration
protected override async Task<OutcomeStatus> RunAsync()
Returns
Type | Description |
---|---|
Task<OutcomeStatus> | A |