Skip to main content

Data Source Advanced Settings

The advanced settings available when setting up a data source in Nexla allow full customization of data ingestion to suit any use case. These settings can be used to configure file parsing options, ingest files according to modification date or path pattern(s), define how data should be organized into Nexsets, and more.

This article provides information about the advanced settings available for data sources in Nexla and instructions for configuring each setting.

Advanced Settings Are Optional

The advanced settings presented below are optional, as Nexla's default settings are appropriate for many workflows, and configuring these settings is not required for every data source.


Advanced settings are accessible in the Configure screen for data sources (step 3 of the source creation process or accessed by editing an existing data source). The menu on the left side of this screen can be used to navigate between settings categories.

Data Sources – Step 3. Configure

Configure.png

1. Scheduling File Checks

Nexla can be configured to search for new files in the configured data source location according to a specified frequency and/or time point using the options available under the Scheduling category.

  • To specify the frequency at which Nexla will scan the data source for new files, select an option from the left-most pulldown menu next to Check for files:.
Frequency.png
  • To specify the time at which Nexla start scanning the data source for new files, select the appropriate hour, minute, and AM/PM options using the pulldown menus on the right next to Check for files:.
Time2.png

2. Data Format (File Parsing)

By default, Nexla automatically detects Nexsets from data sources through parsing based on file extensions. However, parsing and file processing can be customized to address specific scenarios, such as:

  • Files without an extension to indicate the type of parser that should be applied
  • Compressed .zip or .tar files
  • Text files, including those with fixed-width or custom column delimiters (ex: .dat, .txt, .csv, .asc)
  • CSV files without a header row
  • Structured files in which some lines should be skipped
  • Files for which the extension does not match the desired parser
  • Files that require customization of how the default parser treats the contained data

To force Nexla to parse all files from a source according to a specified format, select the corresponding option from the File Content Format pulldown menu under the Data Format section in the Configure screen.

File Content Format Settings

Additional settings are available for many of the options listed in the File Content Format menu. For more information about these settings, see Section 1 in Advanced Settings for File-Based Sources.

FileContentFormat.png

3. Data Selection Options

When setting up a file-based data source, Nexla provides configuration options for specifying which data should be ingested from the source, allowing users to customize data ingestion to suit various use cases. Data can be selected for ingestion from file-based storage systems according to file modification dates, naming patterns, and/or subfolder paths.

The settings discussed in this section are located under the Data Selection category.

3.1 Ingest All Files in the Selected Location

To configure Nexla to ingest all files from the data source, regardless of when the files were added or modified, delete the pre-populated date and time from the Only read files modified after: field, and leave this field blank.

Blank_AllFiles.png

3.2 Ingest Files According to Modification Date

When Nexla should only ingest newer or recently modified files from the data source, the platform can be configured to selectively ingest files modified after a specified date and time.

  1. To specify the file modification date and time that will be used to select which files should be read from this source, click the Calendar.png icon in the Only read files modified after: field under, and select the date from the dropdown calendar.
ModifiedAfter1.png
  1. In the field at the bottom of the calendar, enter the time (in 24-h format) on the selected date that should be referenced when identifying new and/or modified files from the source.
Time.png

3.3 Ingest or Ignore Files According to Path Pattern(s)

Nexla can be configured to scan and/or ignore specific files or subfolders in the selected data source location based on path-naming patterns.

Specifying Data Paths

The Apache Ant Path pattern must be used when specifying path patterns for data to be scanned or ignored. For more information and example path patterns, see the Apache Ant Path documentation for directory-based tasks.

Entered patterns must also start from the root of the selected location accessible with the credentials used to create the data source.


  • To specify path patterns to be scanned and/or ignored, check the box next to Customize Paths to be Scanned/Ignored, and configure one or more of the settings discussed below.
CustomizePaths.png
  • To configure Nexla to scan only files or subfolders that match a specific path pattern, enter the path pattern in the Paths to Be Scanned field.

    • When a pattern is entered in this field, only matching files or subfolders inside the selected storage location will be scannedfor example, when **/ABC/* is entered, only files in the subfolder ABC will be ingested.
PathsScanned.png
  • If Nexla should not scan files or subfolders that match a specific path pattern, enter that pattern in the Paths NOT to Be Scanned field.

    • When a pattern is entered in this field, only matching files or subfolders inside the selected storage location will be ignoredfor example, when **/ABC/* is entered, only files in the subfolder ABC will be ignored.
PathsIgnored.png
  • Enter the time zone referenced within the selected data source location in the Timezone for Path Format field.
PathsTimezone.png