Worker Error Handling
Workers can fail due to missing files, failed database connections etc. This article describes how the worker system handles and propagates different types of errors.
Note
Dataflow workers follow the same rules, but also have dataflow specific error handling capabilities, see Dataflow for details.
Error Scenarios
Successful Outcome
Many error conditions are predictable, e.g. trying to delete a non-existing file, or drop a non-existing table. Workers should as far as possible check for these types of error conditions, and either handle the error (as described in this section), or fail with a specific error message, making it easy for the library user to understand what happened (as described below).
For example, if DeleteFileWorker finds that the file is not
present and ignoreMissingFile
is set to true
, the worker will complete with a
Succeeded
status (and won't log any message about the missing file).
Canceled Outcome
Although not an error in itself, if the library user cancels the worker system by calling
WorkerSystemBase.TryCancel(...),
the root worker as well as any workers that were canceled instead of run will have the
Canceled
status.
Error Outcome
If an error condition is meant to fail the worker, where possible, the worker should detect the error condition, log a suitable error message, and propagate the error.
For example, for a DeleteFileWorker with ignoreMissingFile
set to false
(or not provided):
- If the specified file is not present, the worker will log a
LogCategory.TargetDoesntExist
message (without any associated exception since the message is sufficiently specific) - Furthermore, any
IOException
(e.g. due to permission issues) will be caught, and aLogCategory.FileOperationFailed
will be logged (with the underlying exception, since this will assist troubleshooting)
In both cases the worker completes with an Error
status, which other workers can query
with IsError.
Fatal Outcome
Some errors are sufficiently serious or unknown to the failing worker that the correctness
of the worker system is in doubt. In these cases, the worker should complete with a Fatal
status.
This happens:
- Automatically when a worker throws an exception without catching it, e.g.
DeleteFileWorker
throws anArgumentNullException
if passed anull
fileName - Explicitly when a worker returns a
Fatal
status
A Fatal
port or worker status will always give the port or worker, and all their
ancestor workers, and the worker system a Fatal
status.
Note
The documentation and some parts of the API uses Failed or Failure for things that
can be either of Error
or Fatal
status.
Outcome Escalation
Default Escalation
By default, the completion status of children that run will escalate (a.k.a. propagate)
to their parent, and the parent will get the worst status from its children and its own
processing. 'Worst' is taken in increasing order:
Succeeded
> Canceled
> Error
> Fatal
. E.g.:
Children | Parent Itself | Parent Outcome |
---|---|---|
All Succeeded |
Succeeded |
Succeeded |
One Canceled |
Succeeded |
Canceled |
One Error |
Succeeded |
Error |
One Fatal |
Succeeded |
Fatal |
All Succeeded |
Error |
Error |
Override Escalation
By setting EscalateError on a child worker to false
,
Canceled
and Error
will not be escalated to its parent. This can e.g. be useful
when taking corrective action on non-serious errors.
In the following example, if the primary CopyFileWorker
fails to find the file:
- The primary will get a
Error
state, but will not fail theWorkerSystem
parent due toEscalateError
beingfalse
- The secondary will run, and will escalate any failure to the
WorkerSystem
parent
using actionETL;
new WorkerSystem("Copy File With Fall-back")
.Root(root =>
{
var primary = new CopyFileWorker(root, "Primary"
, @"Src/OverrideEscalation/PrimaryInputFile.csv"
, @"Src/OverrideEscalation/InputFile.csv"
, overwrite: true
)
{ EscalateError = false }; // primary failing will not fail parent worker
var secondary = new CopyFileWorker(root, "Secondary"
, () => primary.IsError // Only run if primary fails
, @"Src/OverrideEscalation/SecondaryInputFile.csv"
, @"Src/OverrideEscalation/InputFile.csv"
, overwrite: true
);
})
.Start()
.ThrowOnFailure();
Important
A Fatal
status will always be escalated up to the worker system root,
irrespective of the EscalateError
property, and the whole worker system would
therefore complete with the Fatal
status.
Retry On Error
To avoid duplicating code, and to retry many times, another alternative is to implement retry-on-error using an iterating worker. Here we use a WhileActionWorker<T> to iterate, and an AddCompletedCallback(Func<WorkerBase, OutcomeStatus, Task<OutcomeStatus>>) callback on the worker to retry on error, up to four times, with increasingly longer delays:
// using actionETL;
// using System.Threading.Tasks;
var sos = new WorkerSystem()
.Root(ws =>
{
int milliSeconds = 50;
const int maxMilliSeconds = 400;
_ = new WhileActionWorker<int>(ws, "Retry"
, waw => true
, waw =>
{
var escalateErrorAndLastIteration = milliSeconds >= maxMilliSeconds;
new FileExistsWorker(waw, "Check File", "MissingFile.txt")
{
EscalateError = escalateErrorAndLastIteration
}
.AddCompletedCallback(async (few, os) =>
{
if (os.IsSucceeded || escalateErrorAndLastIteration)
{
milliSeconds = int.MaxValue;
}
else
{
await Task.Delay(milliSeconds).ConfigureAwait(false);
milliSeconds *= 2;
}
return os;
});
});
})
.Start();