SQL Query-Based Transforms (Record Aggregation)
In the Nexset Designer, the Transform: Aggregate rule type allows users to create Nexsets using SQL query-based transforms. Transforms created with this rule type are applied across all records in a Nexset, generating output data based on complex aggregations across input records. These transforms can be created and applied directly within the Nexla platform, avoiding the need to first transfer data to a database or data warehouse before applying SQL transformations.
SQL query-based transforms can be applied to a Nexset to produce output data for reporting, analytics, notifications, and more.
1. Create an SQL Query-Based Transform
Locate the Nexset to which the SQL query-based transform will be applied, and click on the to open the Nexset in the Nexset Designer.
Beginning Transforms & the Nexset DesignerBeginning Transforms:
Transforms can be initiated by clicking the icon on a Nexset anywhere Nexsets are shown in Nexla. This includes the Nexsets screen and expanded flow views in the All Data Flows and All Projects screens in the Integrate section as well as the Discover section.Nexset Designer:
To learn more about the Nexset Designer, including full feature descriptions and general use instructions, see Nexset Designer Overview.
- Click the button in the Nexset Rules panel, and select Transform: Aggregate from the Rule Group menu.
Add Rule Group Button:
Rule Group Menu:
After completing Step 2 above, a new Transform: Aggregate rule group will be added to the Nexset Rules panel. This rule group is pre-populated with a sample record-aggregation SQL query based on the data in the input Nexset.
The aggregation window defines the span across which the transform will be applied—i.e., whether or not records and corresponding output will be grouped by ingestion cycles. Select the desired option from the Aggregation Window pulldown menu.
Default: Source Ingestion Cycle – When this mode is selected, records and generated output are bucketed by each ingestion cycle. An individual set of output results is produced for the records ingested during each run of the source.
Topic Memory: Across Ingestion Cycles – When this mode is selected, ingestion cycle boundaries are ignored, and the output results will span all records currently retained in topic memory.
Important note: Nexla does not store records forever; therefore, aggregation results will not inherently span all records ever processed into the input Nexset.
The SQL query can be edited directly within the code text field. It can be customized to aggregate records based on any attribute, return the desired output, etc. to fit the use case, or the SQL query can be replaced entirely. Example record-aggregation transform queries are shown in Section 2 below.
Query SyntaxThe SQL query entered in the text field must be compliant with Apache Flink SQL syntax. For more information about Apache Flink SQL sytax, including supported statements and keywords, see Apache's SQL documentation.
To confirm that the entered SQL query-based transform will produce the desired output, click to generate a preview of the transform results in the Nexset Output panel on the right.
Click to save the transform and continue working in the Nexset Designer, or click to save the transform and close the Nexset Designer.
2. Example Query: Census Data
In this example, the input Nexset contains mock census data, including names, ages, and genders of respondents. The SQL query-based record-aggregation transform shown below aggregates the records ingested during each ingestion cycle according to values of the gender attribute.
SELECT gender
– Selects the gender attribute valuesCOUNT (gender) gender_count
– Counts the number of records containing each unique value of the gender attribute and assigns the totals to the gender_count attributeGROUP BY gender
– Groups the results by values of the gender attribute, coordinating gender_count values with gender values
The output Nexset includes each unique gender value in the ingested records and the number of records containing that value.