Skip to main content

SQL Query-Based Transforms (Record Aggregation)

In the Nexset Designer, the Transform: Aggregate rule type allows users to create Nexsets using SQL query-based transforms. Transforms created with this rule type are applied across all records in a Nexset, generating output data based on complex aggregations across input records. These transforms can be created and applied directly within the Nexla platform, avoiding the need to first transfer data to a database or data warehouse before applying SQL transformations.

SQL query-based transforms can be applied to a Nexset to produce output data for reporting, analytics, notifications, and more.

Custom SQL Transforms

In addition to record-aggregation transforms, the Transform: Aggregate rule type can also be used to perform custom 1:1 record transforms with SQL queries.

To create an SQL query-based 1:1 record transform, follow the instructions in Section 1, and enter the desired SQL transform in Step 5.

1. Create an SQL Query-Based Transform

  1. Locate the Nexset to which the SQL query-based transform will be applied, and click on the Transform_Icon.png to open the Nexset in the Nexset Designer.

    Beginning Transforms & the Nexset Designer

    Beginning Transforms:
    Transforms can be initiated by clicking the Transform_Icon.png icon on a Nexset anywhere Nexsets are shown in Nexla. This includes the Nexsets screen and expanded flow views in the All Data Flows and All Projects screens in the Integrate section as well as the Discover section.

    Nexset Designer:
    To learn more about the Nexset Designer, including full feature descriptions and general use instructions, see Nexset Designer Overview.

  1. Click the Add_Rule_Group.png button in the Nexset Rules panel, and select Transform: Aggregate from the Rule Group menu.

Add Rule Group Button:


Rule Group Menu:

  1. After completing Step 2 above, a new Transform: Aggregate rule group will be added to the Nexset Rules panel. This rule group is pre-populated with a sample record-aggregation SQL query based on the data in the input Nexset.

  2. The aggregation window defines the span across which the transform will be appliedi.e., whether or not records and corresponding output will be grouped by ingestion cycles. Select the desired option from the Aggregation Window pulldown menu.

    • Default: Source Ingestion Cycle When this mode is selected, records and generated output are bucketed by each ingestion cycle. An individual set of output results is produced for the records ingested during each run of the source.

    • Topic Memory: Across Ingestion Cycles When this mode is selected, ingestion cycle boundaries are ignored, and the output results will span all records currently retained in topic memory.
      Important note: Nexla does not store records forever; therefore, aggregation results will not inherently span all records ever processed into the input Nexset.

  1. The SQL query can be edited directly within the code text field. It can be customized to aggregate records based on any attribute, return the desired output, etc. to fit the use case, or the SQL query can be replaced entirely. Example record-aggregation transform queries are shown in Section 2 below.

    Query Syntax

    The SQL query entered in the text field must be compliant with Apache Flink SQL syntax. For more information about Apache Flink SQL sytax, including supported statements and keywords, see Apache's SQL documentation.

  2. To confirm that the entered SQL query-based transform will produce the desired output, click Run_Nexset_Rules.png to generate a preview of the transform results in the Nexset Output panel on the right.

  3. Click Save.png to save the transform and continue working in the Nexset Designer, or click Save_Close.png to save the transform and close the Nexset Designer.

2. Example Query: Census Data

In this example, the input Nexset contains mock census data, including names, ages, and genders of respondents. The SQL query-based record-aggregation transform shown below aggregates the records ingested during each ingestion cycle according to values of the gender attribute.

  • SELECT gender Selects the gender attribute values
  • COUNT (gender) gender_count Counts the number of records containing each unique value of the gender attribute and assigns the totals to the gender_count attribute
  • GROUP BY gender Groups the results by values of the gender attribute, coordinating gender_count values with gender values

The output Nexset includes each unique gender value in the ingested records and the number of records containing that value.