DB-CDC Data Flows
DB-CDC (Database–Change Data Capture) data flows use CDC to replicate tables across databases and/or cloud warehouses. This flow type runs on the Kafka engine and is designed for use with data is stored in multiple locations, where any changes to the data in one store need to be duplicated into another store.
1. DB-CDC Flows
The DB-CDC flow type is streamlined for use in data migration and maintenance workflows, allowing users to quickly create data flows that replicate data across storage locations.
Most often, this flow type is used to replicate all data in one store into another store. During flow creation, Nexla also provides data source customization options that allow users to exclude some tables or to only include specific tables when capturing and replicating data into the destination location, as well as destination settings to configure table prefixes and/or suffixes, record lineage tracking, and column mapping.
DB-CDC flows are designed to move data from a single data source to a single destination and do not allow Nexset transformations.
For workflows that require transformations, moving data to multiple locations, and other Nexla features, the FlexFlow flow type is recommended.
DB-CDC Data Flow
Setting up a DB-CDC data flow in Nexla to maintain data synchrony between databases is easy. The following sections provide step-by-step instructions for creating a new DB-CDC flow, along with information about optional features and settings available for this flow type.
DB-CDC Flows: Quick Start Guide Video
2. Important Prerequisites for DB-CDC Flows
In DB-CDC data flows, transaction logs from the data source are used to monitor changes to data in the source location. Changes reflected in these transaction logs are then transferred to the destination according to the configured data flow settings.
Therefore, the credentials used to create the data source in a DB-CDC flow must have permission to read transaction logs. These permissions should be granted by the database administrator for the data source location.
3. Create a New DB-CDC Flow & Data Source
After logging into Nexla, navigate to the Integrate section by selecting from the platform menu on the left side of the screen.
Click at the top of the Integrate toolbar on the left.
Select DB-CDC from the list of flow types, and click to proceed to data source creation.
Learning About Flow TypesIn the Select Flow Type screen, click on a data flow type to view more information about it in the panel on the right, including a brief introductory video.
Data flows of each type can also be created by clicking the button at the top of this panel.
3.1 Configure the Data Source
In the Connect screen, select the connector tile matching the data source type from the list.
DB-CDC data flows are only supported for some of the connectors available in Nexla, and only supported connectors are shown on this screen.
To create a data flow with a data source type not available in this screen, use the FlexFlow flow type or one of the other flow types listed on the Nexla Data Flow Types page.
In the Authenticate screen, select or create the credential that will be used to connect to the data source, and click . Detailed information about credential creation for specific sources can be found on the Connectors page.
Important: Credential PermissionsThe credentials used to create the data source in a DB-CDC flow must have permission to read transaction logs from the source location. More information can be found in Section 2 above.
- Enter a name for the data source in the Source Name field.
- The subsections below provide information about additional settings available for data sources in DB-CDC flows. Follow the listed instructions to configure each setting for this data source, and then proceed to Section 3.2.
Data Selection
With DB-CDC flows in Nexla, users can define which data from the source location will be reflected into the destination. Source data in these flows can be configured to include the entire database associated with the data source credential or to include or exclude specified tables.
▷ Include all data in the source location:
Select the Automatic schema selection mode. When this mode is used, all data accessible to the credential used to create the data source will be reflected into the destination.
Example: All tables in the
nexla_metrics_prod
database will be included
in the DB-CDC flow
▷ Exclude some tables in the source location:
Select the Select by name schema selection mode, and ensure that the Exclude tables option is selected.
Click the icon next to the database name to expand the list of tables, and check the box(es) next to the table(s) that should be excluded from this DB-CDC flow.
Example: Data in the
connector_state
andcustom_flow_hourly
tables will be
excluded from the DB-CDC flow
▷ Include only some tables in the source location:
Select the Select by name schema selection mode, and select the Include tables option.
Click the icon next to the database name to expand the list of tables, and check the box(es) next to the table(s) that should be included in this DB-CDC flow.
Example: Only data & detected changes in the
account_metrics_daily
table will be
included in the DB-CDC flow
Existing Data
Some use cases require capturing pre-existing data in the source location and reflecting it into the destination, while others need to reflect only newly added data. Nexla can easily be configured for either workflow with a single setting.
▷ To ingest and reflect only data that is added to the source location after the DB-CDC flow is created:
Ensure that the Ingest existing data switch is in the off position (gray). This is the default setting, so typically, no change is needed.
▷ To ingest and reflect all data currently existing in the source location along with newly added data in this DB-CDC flow:
Activate the Ingest existing data setting by clicking on the slide switch. Once activated, the switch will turn blue.
Change Capture Settings
In a DB-CDC data flow, Nexla will always capture and reflect data that is newly created in the source location. However, users can choose whether or not data deletions in the source location will also be reflected into the destination.
▷ To only reflect newly created data into the destination, without synching any data deltions, in this DB-CDC flow:
In the Capture change events settings, ensure that the Delete switch is in the off position (gray). This is the default setting, so typically, no change is needed.
▷ To reflect data deletions in addition to newly created data into the destination in this DB-CDC flow:
In the Capture change events settings, activate the Delete setting by clicking on the slide switch. Once activated, the switch will turn blue.
Scheduling
Scan scheduling options can be used to define the freqency at which the data source will be scanned for data changes in a DB-CDC flow. Any data changes identified during a scan will then be replicated into the configured destination.
By default, when a new DB-CDC data flow is created, Nexla is configured to scan the source for data changes once every day. To continue with this setting, no further selections are required. Proceed to Section 3.2.
To define how often Nexla should scan the data source for new data changes, select an option from the Replication Frequency pulldown menu.
- When options such as Every N Days or Every N Hours, a secondary pulldown menu will be populated. Select the appropriate value of N from this menu.
To specify the time at which Nexla should scan the source for new data changes, check the box next to Set Time. Enabling this option will populate new pulldown menus that can be used to define the scan time.
- The scan time menus populated when Set Time is enabled vary according to the selected Replication Frequency.
Step 1:
Step 2:
3.2 Save & Activate the Data Source
Once all required settings and any desired additional options are configured, click in the top right corner of the screen to save & activate the data source.
Once the data source is created, Nexla will automatically scan it for data according to the configured settings. Identified data will be organized into a Nexset, which is a logical data product that is immediately ready to be sent to a destination.
New DB-CDC Data Flow with Data Source & Detected Nexset
4. Create the Data Destination
DB-CDC flows are designed to move data from a single data source to a single destination and do not allow Nexset transformations. For workflows that require transformations, moving data to multiple locations, and other Nexla features, the FlexFlow flow type is recommended.
- Click the icon on the detected Nexset in the DB-CDC data flow to begin setting up the destination to which the data will be sent.
4.1 Configure the Destination
In the Connect screen, select the connector tile matching the data destination type from the list.
DB-CDC data flows are only supported for some of the connectors available in Nexla, and only supported connectors are shown on this screen.
To create a data flow that sends data to a destination type not available in this screen, use the FlexFlow flow type or one of the other flow types listed on the Nexla Data Flow Types page.
- In the Authenticate screen, select or create the credential that will be used to connect to the data destination, and click . Detailed information about credential creation for specific destinations can be found on the Connectors page.
- Enter a name for the destination in the Destination Name field.
- The subsections below provide information about additional settings available for data destinations in DB-CDC flows. Follow the listed instructions to configure each setting for this destination, and then proceed to Section 4.2.
Namespace
In a DB-DCD data flow, users can designate the destination namespace to which data will be sent.
By default, Nexla will send the data to the namespace with a title matching the destination name entered in Step 3 above; or, if no matching namespace title exists in the destination, Nexla will create a new namespace with the entered title. To use this default setting, no further action is required.
To specify the namespace to which data will be sent in this flow, enter the namespace title in the Destination Namespace field.
Destination Table
Nexla can be configured to include a prefix and/or suffix in the title of any tables created in the destination of a DB-CDC data flow.
▷ To include a prefix in the title of created tables:
Enter the prefix that will be included for each table in the Destination Table Prefix field.
▷ To include a suffix in the title of created tables:
Enter the suffix that will be included for each table in the Destination Table Suffix field.
Change Capture Settings
In a DB-CDC data flow, Nexla will always capture data that is newly created in the source location and replicate that data into the destination location. However, users can choose whether or not data deletions in the source location will also be reflected into the destination.
▷ To only reflect newly created data into the destination, without synching any data deltions, in this DB-CDC flow:
In the Capture change events settings, ensure that the Delete switch is in the off position (gray). This is the default setting, so typically, no change is needed.
▷ To reflect data deletions in addition to newly created data into the destination in this DB-CDC flow:
In the Capture change events settings, activate the Delete setting by clicking on the slide switch. Once activated, the switch will turn blue.
Record Lineage Tracking
Nexla provides the option to trace the lineage of records sent to the destination in DB-CDC data flows. This lineage includes the origin of the record data, all applied transformations or changes, and all destinations to which the record data has been sent over time.
To enable record lineage tracking, click on the Track Lineage slide switch. Once activated, the switch will turn blue.
Enter the tracker name in the Track Lineage text field. Nexla will include an attribute with the name entered in this field. This attribute will be assigned a unique value for each record written to the destination.
Stream Selection
Nexla can be configured to replicate all data & changes detected in the data source into the destination location, or users can specify only selected data to be replicated into the destination.
▷ Replicate all detected data & changes into the destination:
Under the Streams tab at the bottom of the screen, ensure that the Automatic option is selected. This setting is selected by default, so typically, no change is needed.
Example: All tables in the
nexla_metrics_prod
database will be replicated
into the DB-CDC flow destination
▷ Replicate only data & changes detected in selected locations into the destination:
Under the Streams tab at the bottom of the screen, select the Customize option, and deactivate the Sync All slide switch.
Click the icon next to the database name to expand the list of tables, and check the box(es) next to the table(s) that should be synched to the destination this DB-CDC flow.
Example: Only data & detected changes in the
account_metrics_daily
anddata_monitor_daily
tables
will be replicated into the DB-CDC flow destination
4.2 Save & Activate the Destination
Once all required settings and any desired additional options are configured, click in the top right corner of the screen to save the data destination.
Important: Data MovementData will not begin to flow into the destination until it is activated by following the instructions below.
- To activate the destination, click the icon on the destination, and select from the dropdown menu.