SQL Database Access
The Adb infrastructure (in actionETL.Adb and its child namespaces) is used in ETL applications for accessing databases via SQL, both on-premises, and in the cloud. This includes both relational SQL databases, NoSQL databases, and many other querying systems, accessible via dedicated providers or via the Adb ODBC Provider.
Adb consists of wrappers around the standard .NET database routines residing in System.Data and similar namespaces, adding ETL functionality such as:
- Get which database tables and columns are available
- Support for provider-specific data types, which are very useful for avoiding conversion issues
- Enhanced automated mapping between .NET CLR types and database types
- Encapsulating database connections, to support both short-term connections that are opened and closed as needed inside a worker, and connections that are kept open until explicitly closed by the library user, e.g. for using a single transaction or temporary table across multiple workers
- Execute predefined table commands, e.g. delete all rows, truncate, drop, exists
- Provide additional information about the database, which assists with writing
provider-independent code, so that a worker (such as
AdbInsertTarget
) can support all Adb database providers.- Note: This aspect is mainly used when developing your own custom Adb workers
Note
The wrappers that have a corresponding System.Data
class (e.g. AdbCommand
that
wraps a DbCommand
) behaves as their underlying .NET classes do, except where otherwise
noted
Database Support
For Adb workers to access a database, actionETL requires that:
- An underlying .NET database provider for the database is available, and
- actionETL has implemented an Adb wrapper for that database provider
An actionETL application can use different Adb providers for different workers, including for the same data source.
There are targeted Adb wrappers for specific databases (MariaDB, MySql, PostgreSQL, SQLite, SQL Server), as well as universal ODBC wrappers compatible with most ODBC data sources. Please see Adb Supported Databases for full details.
Data Type Support
actionETL supports both regular .NET data types (int
, string
etc.), as well as
(when using a dedicated provider e.g. "Microsoft.Data.SqlClient")
many database specific data types, e.g. System.Data.SqlTypes.SqlDateTime
.
The latter are very useful for avoiding conversion issues, for further details see:
Adb Workers
The type names of all Adb workers (which are listed below) are prefixed with "Adb".
Non-Dataflow Adb Workers
Worker | Description |
---|---|
AdbExecuteNonQueryWorker | Execute a query on an SQL database (e.g. INSERT, UPDATE, DELETE and DML statements) without returning any result set. RecordsAffected is returned.Examples: Adb Commands and Parameters, Adb Keep Open Connection, Slowly Changing Dimension Example. |
AdbExecuteScalarWorker<TResult> | Execute a query on an SQL database and return a single result value. Examples: See AdbExecuteNonQueryWorker above. |
AdbTableNonQueryWorker | Execute a query on a table-like object in an SQL database: DELETE all rows (with RecordsAffected ), IF EXISTS, DROP, TRUNCATE. Requires the Adb provider to support IAdbTableCommandService. Examples: SQL Database Access, Samples. |
AdbTransactionActionWorker | Creates an AdbKeepOpenConnection and a local AdbTransaction, runs a user supplied callback inside the transaction, and commits/rolls back and disposes the transaction and connection. Example: Adb Keep Open Connection. |
Adb Dataflow Sources
Worker | Description |
---|---|
AdbDataReaderSource<TOutput> | Execute an SQL query on a database, and pass result set rows to the downstream worker. Examples: AdbConnectionString Example, Dataflow Lookups, Slowly Changing Dimension Example. |
Adb Dataflow Targets
Factory Class | Description and Worker Class |
---|---|
AdbExecuteNonQueryTargetFactory | Execute an SQL query for each incoming row. Worker: AdbExecuteNonQueryTarget<TInputError>. |
AdbInsertTargetFactory | Inserts incoming rows into an SQL table or view. Worker: AdbInsertTarget<TInputError>. Examples: SQL Database Access, column mapping example 1 and example 2, Batch Insert details SQL Database Access, Process Incoming Files Example, AggregateCsvCreateInsertTable Sample |
AdbMySqlConnectorBulkInsertTargetFactory | High performance bulk insert of incoming records into a MySQL compatible table or view using MySqlConnector.MySqlBulkCopy. Worker: AdbMySqlConnectorBulkInsertTarget<TInput>. Examples: See the similar AdbInsertTarget in SQL Database Access and column mapping examples, see the similar AdbSqlClientBulkInsertTarget in Slowly Changing Dimension Example. |
AdbSqlClientBulkInsertTargetFactory | High performance bulk insert of incoming records into a SQL Server table or non-partitioned view using SqlBulkCopy. Worker: AdbSqlClientBulkInsertTarget<TInput>. Examples: See the similar AdbInsertTarget in SQL Database Access and column mapping examples, Slowly Changing Dimension Example. |
Classes and Interfaces
These are the main Adb infrastructure types:
Name | Description |
---|---|
AdbProvider | Wraps an ADO.NET database provider and supplies related services. Creates AdbConnectionString , AdbConnectionBuilder , AdbKeepOpenConnectionBuilder , and AdbCommandBuilder instances. |
AdbConnectionString | A read-only connection string for a particular provider. Creates AdbConnectionBuilder , AdbKeepOpenConnectionBuilder , and AdbCommandBuilder instances. |
AdbConnectionBuilder | Builder for a single AdbConnection , used by a single worker. Also creates AdbCommandBuilder instances. |
AdbKeepOpenConnectionBuilder | Builder for a single AdbKeepOpenConnection connection that can be kept open across multiple workers. Also creates AdbCommandBuilder instances. |
IAdbConnectionBuilder | Interface for a connection builder, use this for method parameters etc. (rather than AdbConnectionBuilder and AdbKeepOpenConnectionBuilder classes). |
AdbConnection | A single connection for a single worker. Can be opened, closed, and disposed. Similar to a DbConnection with extra information and functionality. Creates AdbCommand instances. |
AdbKeepOpenConnection | A single connection that can be kept open across multiple workers, closed, and disposed. Similar to a DbConnection with extra information and functionality. Creates AdbCommand instances. |
IAdbConnection | Interface for a connection, implemented by AdbConnection and AdbKeepOpenConnection classes. |
AdbCommandBuilder | Builder for a single AdbCommand database command and its parameters. |
AdbCommand | A database command, similar to DbCommand, with extra information and functionality. Executed against the database. |
AdbParameterCollection | Collection of AdbParameter , similar to DbParameterCollection. Used by AdbCommandBuilder and AdbCommand . Main way to create, add, and inspect parameters. |
AdbParameter | Database parameter for passing data to and from a database, similar to DbParameter. Includes methods for setting parameter database type, which affects how values are converted. |
AdbTransaction | A transaction instance, similar to a DbTransaction with extra information and functionality. See Adb Keep Open Connection. |
AdbDataSourceInformation | Data source information: quoting character, parameter markers etc. |
AdbTableInformation | Optional service that checks if table exists, returns table column information, manipulates table identifiers. |
AdbTableCommand | Optional service that executes table commands (truncate, drop, delete all rows). |
Builders and Factories
Adb workers that require a single database connection take as a parameter one of the following, from which the worker gets a single actual connection:
IAdbConnectionBuilder
connection builder (fromAdbConnectionBuilder
orAdbKeepOpenConnectionBuilder
)AdbCommandBuilder
command builder (with an embedded connection builder)
Workers that require many or an unknown number of database connections instead take
an AdbConnectionString
connection builder factory as a parameter,
from which the worker can create many connection builders, and from them, multiple
actual connections.
In particular, these builders and factory specifies which database provider and connection string to use when the actual connection is later created.
This separation allows workers to be agnostic about the type of the connection -
Adb workers execute create/open/close/dispose commands on all actual
Adb connections, and only the library user knows or cares what type
(AdbConnection
or AdbKeepOpenConnection
) the connection is.
A minor second benefit is that the most common cases do not require the library user
to dispose the connection builder or factory, only the more rarely used
AdbKeepOpenConnectionBuilder
requires explicit disposal.
Note
An ETL worker can of course take more than one Adb parameter, e.g. an
AdbConnectionString
for creating multiple connections to one particular
database, as well as an IAdbConnectionBuilder
for accessing a different
database.
AdbConnectionBuilder
Example
This ETL example loads a CSV file into a database table, which includes passing connection details (using a connection builder) to a database worker:
using actionETL;
using actionETL.Adb;
using actionETL.Adb.SqlClientExternal;
using actionETL.FileHelper;
using FileHelpers;
public static partial class CreateWorkerViaCallback
{
[DelimitedRecord(",")]
private sealed class Category
{
public int CategoryId;
public string CategoryName;
}
public static SystemOutcomeStatus RunExample()
{
var provider = AdbSqlClientProvider.Get();
return new WorkerSystem()
.Root(root =>
{
new FileHelperFileSource<Category>(root, "Read CSV"
, @"Src/CreateWorkerViaCallback/Category.csv")
.Output.Link.AdbInsertTarget("Insert"
, provider.CreateConnectionBuilder(root.Config["SqlServer"])
, "dbo.Category");
})
.Start();
}
}
/* The example assumes the actionetl.aconfig.json file contains:
{
"configurations": [
// Notice escaped backslash:
{ "SqlServer": "Trusted_Connection=true;Encrypt=true;TrustServerCertificate=true;server=(localdb)\\MSSQLLocalDB;database=MyDatabase" }
]
}
*/
/* The example assumes the following table already exists:
CREATE TABLE Category
(
CategoryId int,
CategoryName varchar(50)
)
*/
Key points:
- The
Category
class is used for:- The CSV column specification
- The schema and data rows passed between the two dataflow workers
- The schema for inserting into the database table "Category"
- By default,
AdbInsertTarget
maps dataflow columns with database columns by name. Custom mappings can also be specified.
- By default,
- An
AdbConnectionBuilder
is created from the connection string named "SqlServer" in the app.config (or web.config) file. It is passed to the worker, which uses it to create a database connection.
Providers
Any Adb database work starts with getting an AdbProvider instance (e.g. a SQL Server instance or MySQL or MariaDB instance), and use it to create any required Adb connection strings, Adb connection builders, and Adb command builders.
Note that there can be multiple 'Adb provider' instances for a single type of data source, each with different settings. For instance, use Get(Boolean, Int32, Int32, Int64) to get a SQL Server provider instance that uses ANSI double quotes for quoting identifiers and sets a custom size for batch inserts.
Provider Services
An Adb provider instance consists of a set of services, and the user can easily create a new, modified provider instance using AdbProvider methods, by supplying a new or modified service, which will change the behavior of the workers that use them.
Some services (IAdbInsertStatementService, IAdbTableCommandService and IAdbTableInformationService) are optional, and any workers that rely on a service that a particular provider does not supply are unsupported. Dedicated providers such as AdbSqlClientProvider and AdbMySqlClientProvider will typically provide all services, while universal providers such as AdbOdbcProvider will typically lack at least IAdbTableCommandService. See the documentation for the individual services for details.
Batch Insert
By default, the AdbInsertTarget<TInputError> worker uses batch insert (multiple rows per insert statement) and multiple statements per periodic transaction to improve throughput while reducing database round-trips as well as client and database load.
Batch and transaction sizes can be set, and explicit transactions can be disabled altogether. In decreasing precedence:
- If the AdbInsertTarget.ErrorOutput port is linked to a downstream worker, the creation of transactions is automatically disabled.
- The below settings will be automatically limited to the insert service MaxRowsPerBatch and MaxValuesPerBatch.
- For individual workers, via SetRowLimits(Int32, Int64).
- For all or individual workers, via the AConfig Configuration Facility using the InsertStatementValuesPerBatch and InsertStatementValuesPerTransaction string keys.
For all AdbInsertTarget<TInputError> workers using a specific Adb provider instance, use the appropriate provider-specific overload. To e.g. get single row batches without explicit transactions:
var provider = AdbSqlClientProvider.Get(1, 1, -1);
You can also achieve the same by creating a new provider instance with default settings, and the replace the insert service with With(String, IAdbInsertStatementService).
Note
- Some of the above settings use 'rows per batch and transaction', and some use 'values (or parameters) per batch and transaction'; the latter is more useful across different row widths. To get the number of values, multiply the row count with the number of columns you are inserting in each row.
- An insert error will reject the whole multi-row batch or transaction, and the
error message won't explicitly specify which row caused the error.
Only the failed batch insert will be sent to the
ErrorOutput
port, even if transactions are used and some or all previous batches are rolled back. AdbInsertTarget
participates in any pre-existing transaction on the connection, which will then include all rows the worker inserts.- Using both a transaction and linking the
ErrorOutput
port to a downstream worker is incompatible, and would fail the worker.
- Using both a transaction and linking the
- Bulk Insert (supported by the Adb MySqlConnector Provider and the Adb SqlClient Provider) is significantly faster than batch insert for multi-row inserts.
Bulk Insert
Bulk insert also improves throughput while reducing database round-trips as well as client and database load, typically even more than batch insert.
Bulk insert however requires a database specific facility. This is currently supported for SQL Server (AdbSqlClientBulkInsertTargetFactory) and MySQL compatible databases with the MySqlConnector provider (AdbMySqlConnectorBulkInsertTargetFactory).
Note
With bulk insert an insert error will reject the whole batch, and the error message won't explicitly specify which row caused the error.
See e.g. the Slowly Changing Dimension Example which uses bulk insert workers:
var insertScd = lookup.FoundOutput.Link.AdbSqlClientBulkInsertTarget(
"INSERT SCD Rows"
, rcc => rcc.AutoName(type2Trs.ColumnCount)
, _adbConnectionString.CreateConnectionBuilder()
, SqlBulkCopyOptions.Default, "dbo.Update_Dim" + _tableName);
Connection Strings
You choose where to store and retrieve your connection strings from. actionETL does however have direct support for retrieving them from:
- The AConfig Configuration Facility, as in the above example, which uses the worker system Config property and is the primary and often simplest option
- The .NET Framework Run-time Settings. Note that this is only supported on .NET Framework, and the 'providerName' setting is not used or checked:
<connectionStrings>
<add name="SqlServer"
providerName="Microsoft.Data.SqlClient"
connectionString="Trusted_Connection=true;Encrypt=true;TrustServerCertificate=true;server=(local);Database=actionETLDocumentationDatabase" />
</connectionStrings>
Retrieving from app.config / web.config uses one of the *FromAppConfig()
methods (also see ConnectionStringSettingsCollectionExtensions):
var connectionBuilder = provider.CreateConnectionBuilderFromAppConfig("SqlServer")
http://www.connectionstrings.com/ is a good external resource for how to set connection strings for different databases and versions.
Thread Safety
A connection builder can only create a single actual connection.
A particular actual connection (i.e. an AdbConnection
or
AdbKeepOpenConnection
instance) must never
be used by more than one thread at any one time. This means that:
- An
AdbConnectionBuilder
instance must be passed to no more than a single Adb worker - An
AdbKeepOpenConnectionBuilder
instance can be passed to multiple workers, but the library user must ensure that no more than one of those workers run at the same time:- Create these workers serially (i.e using normal, sequential code)
- Start constraints, worker grouping, and (less commonly) fully blocking dataflow workers must ensure they don't run in parallel
AdbConnectionString
Example
This example truncates the database target table before copying all rows from a source
table to the target table, so we use an AdbConnectionString to create
multiple connection builders. Instead of hard coding TRUNCATE TABLE
, we use
AdbTableNonQueryWorker, which can also handle DROP
, IF EXISTS
,
and DELETE
all rows.
using actionETL;
using actionETL.Adb;
using actionETL.Adb.SqlClientExternal;
public static partial class AdbSynchronizeTable
{
private sealed class Category
{
public int CategoryId;
public string CategoryName;
}
private const string _sourceTableName = "dbo.CategorySource";
private const string _targetTableName = "dbo.Category";
private static void RunExample()
{
new WorkerSystem()
.Root(root =>
{
var acs = AdbSqlClientProvider.Get()
.CreateConnectionString(root.Config["SqlServer"]);
var truncate = new AdbTableNonQueryWorker(root
, acs.CreateConnectionBuilder()
, AdbTableNonQueryOperation.TruncateTable, _sourceTableName);
_ = new AdbDataReaderSource<Category>(root, "Extract Table"
, () => truncate.IsSucceeded, acs.CreateConnectionBuilder()
, "SELECT CategoryId, CategoryName FROM " + _targetTableName
, cmca => cmca.AutoName())
.Output.Link.AdbInsertTarget("Insert Table"
, acs.CreateConnectionBuilder(), _targetTableName);
})
.Start()
.ThrowOnFailure();
}
}
You can obviously expand this simple example. At the other end of the scale, also see these two large examples: Process Incoming Files Example and Slowly Changing Dimension Example.
See Also
- Common Tasks
- Release Notes
- Getting Started
- Worker System
- Workers
- Dataflow
- SQL Database Access
- .NET Framework Data Providers (external)
- Data Formats
- Transfer Protocols