Custom Transforms Using Python/Javascript/JSON
This article provides information about creating custom attribute and record transformations in Nexla using Python, JavaScript, and/or JSON code.
1. What Are Custom Code Transformations?
Nexla includes many pre-built transformation functions within the Nexset Designer that can be used to accomplish most data flows; however, some data flows require specific, customized data transformations. For these operations, users can create custom attribute and record transforms in Nexla using Python, JavaScript, and/or JSON code. Custom transforms can also be used for transformations that would otherwise require multiple pulldown-menu attribute selections or when operating on nested arrays of objects.
Custom attribute transforms are customized transformation functions that modify attributes in an input Nexset. Custom attribute transforms follow the function signature of transformAttribute(input, metadata, args) and receive the entire incoming record as input.
For data flows that require the application of a consistent set of transform rules regardless of the input Nexset schema, custom record transforms can be used to specify the entire output Nexset record. Custom record transforms follow the function signature of transform(input, metadata, args), receiving the entire incoming Nexset record as input and any associated metadata as metadata.
Both custom attribute and record transforms can be constructed within individual Nexset transformations, or they can be created as reusable transforms that can be applied to multiple Nexsets and shared with other users in the organization.
For more information about creating, using, and sharing reusable transforms, see the Reusable Attribute Transforms and Reusable Record Transforms articles in the Help Center.
2. Create a Custom Transform in the Nexset Designer
To learn how to access the Nexset Designer, see the Nexset Designer Overview article in the Help Center.
Click the button in the Nexset Rules panel, and select from the menu that appears.
In the newly created Transform: Code rule group, select the desired coding language—Python, JavaScript, or JSON—from the pulldown menu. These options are found under the "Write Custom Code" category at the top of the list.
JavaScript code written in Nexla must be in vanilla JavaScript format (no ES6).
Enter the necessary code in the text field below the pulldown menu, adhering to the function signature.
3. Function Signatures
When writing custom transform code, users must follow the function signature corresponding to the selected language. These function signatures are displayed in the rule group text field when each language is selected.
The function signatures for each language are shown below. In these signatures,
input
contains the input Nexset record as a JSON object, andmetadata
contains Nexla metadata attributes about the input Nexset.Python
def transform(input, metadata, args):
return inputJavaScript
function transform(input, metadata, args) {
return input;
}JSON
[]
Any custom code and unlimited supporting functions can be entered in the text field, but the function signature cannot be changed or removed.
4. Metadata Parameter
The metadata parameter contains Nexla metadata attributes about the input Nexset. These metadata attributes comprise information about how the Nexset was brought into Nexla and can be used as needed when specifying transform rules for the output Nexset.
The table below lists the available Nexla metadata attributes.
Nexla Metadata Attributes | Attribute Name | Description |
---|---|---|
ingestTime | Nexset record ingestion time in Unix epoch milliseconds | 1538562212976 |
resourceId | ID of the resource at the root of the flow | 12302 |
resourceType | Resource type of the resource ID metadata | SOURCE |
runID | Unique flow execution run ID of the source | 1638205638431 |
sourceBucket | Top-level bucket/directory of the data source | daily-logs.example.com |
sourceOffset | Data source offset of the record, i.e., the line number of the file to which the record belongs | 1001 |
sourcePath | Path of the ingested resource from the top-level bucket/directory/API path/database/etc. | hourly_events/2017-07-28/1700.json |
sourceType | Data source type of the input Nexset | FTP |
tags | Optional captured key-value properties of the input Nexset source | {"from_email": "test@nexla.com"} |
trackerId | Unique record tracker containing lineage information about the record as it flows through Nexla | cjExMDA5OjlyNjY2OnRyYWRlc18wOV 8yNy5jc3Y6MjoxOjE1NjcwNtgyNdy5M DA7Mjl2NzE6MzU5OTc7ODgzNzoxOjl 5Mw |
transformTime | Nexla record transformation time in Unix Epoch milliseconds | 1498776641620 |
transformTimeISO8601 | Nexla record transformation time in ISO8601 format | 2017-06-29T22:50:41Z |
5. Using Nexla's Pre-Built Functions in Custom Transforms
Nexset rules available in the Nexset Designer can also easily be applied in custom-coded transforms created with Python or JavaScript. The table below lists the function signatures of Nexla's predefined transform functions that may be useful in custom transforms.
Only pre-built functions that are typically useful in custom transforms are shown in the table. For a complete list of Nexla's pre-built transform rules, see the List of Nexla's Pre-Built Transforms article in the Help Center.
To use any of the listed functions, replace \<transform\>
with the function name, and add the relevant parameters as arguments (replacing param1...paramN) to the following code format:
nexla_fn.call(\<transform\>',param1...paramN)
Security
Function | Description | Parameters | Example Call |
---|---|---|---|
stringHash | Tokenize the input value using the MD5 or SHA256 algorithm | 1. hash_method: Hash encryption method – either 'md5' or 'sha256' for the corresponding tokenization algorithm 2. stringToHash: Input string or attribute of which to return the encrypted value | nexla_fn.call("stringHash","sha256","test") |
integerHash | Encrypt an integer value | 1. hash_method: Hash encryption method – either 'md5' or 'sha256' for the corresponding tokenization algorithm 2. integerToHash: Input integer or attribute of which to return the encrypted value | nexla_fn.call("integerHash","sha256",123) |
Lookups
Function | Description | Parameters | Example Call |
---|---|---|---|
toMapValue2 | Identifies the lookup row of an input attribute based on matching with the primary key values of the specified lookup and returns the value of the lookup secondary key column | 1. lookupId: Nexla static or dynamic lookup from which to fetch entries 2. lookupOnKey: Input attribute or value for filtering lookup entries 3. lookupKeyToFetch: Secondary key column from the lookup from which the value of the matching row will be returned | nexla_fn.call("toMapValue2",1200,"1","code") |
getMap | Returns the entire object contained in a lookup table based on exact matching of an input with the primary key of the specified lookup | 1. lookupId: Nexla static or dynamic lookup from which to fetch entries 2. lookupOnKey: Input attribute or value for filtering lookup entries | nexla_fn.call("getMap",1200,"1") |
Date & Time
Function | Description | Parameters | Example Call |
---|---|---|---|
epochFromString | Converts a valid ISO8601-formatted string timestamp to the corresponding Unix Epoch time | 1. inputString: ISO8601-formatted input string to be converted to Unix epoch time | nexla_fn.call("epochFromString","2020-11-10T07:44:25Z") |
extractDate | Extracts specific date parts from a date-time string | 1. toFormat: Desired format of output that includes the extracted date-time parts of input string 2. inputString: String value that needs to be parsed (must be of a valid date-time format) | nexla_fn.call("extractDate","dd-MMM-yy","09/28/2017 11:43:00") |
convertTimeZone | Converts a valid date-time string from one time zone to another time zone | 1. fromZone: Time zone from which the date-time string will be converted 2. toZone: Time zone to which the date-time string will be converted 3. timeToConvert: Date-time string to be converted to a different time zone | nexla_fn.call("convertTimeZone","GMT","America/New_York","2020/11/09") |
timestampDifference | Returns the absolute difference of two timestamps in one of the supported ISO8601 units of time | 1. differenceUnit: Time unit for computing the difference (sss – milliseconds, s – seconds, m – minutes, h – hours, d – days, w – weeks) 2. subtractTs: First timestamp to be used for computing the difference between timestamps 3. fromTs: Second timestamp to be used for computing the difference between timestamps | nexla_fn.call("timestampDifference","m","2016-03-22T22:54:14Z","2016-03-22T22:44:14Z") |
iso8601FromEpoch | Converts a valid Unix Epoch time to the corresponding ISO8601-formatted string timestamp | 1. unixEpoch: Time in Unix Epoch format to be converted to ISO8601 format | nexla_fn.call("iso8601FromEpoch",1604994265) |
iso8601FromString | Converts a valid date-time string to the corresponding ISO8601-formatted string timestamp | 1. timeString: Time string to be converted to ISO8601 format (must be in a valid date-time format) | nexla_fn.call("iso8601FromString","25/02/2017") |
Mathematical Operations
Function | Description | Parameters | Example Call |
---|---|---|---|
min | Returns the minimum value of elements in an array | 1. inputArray: Array attribute from which to extract the minimum value | nexla_fn.call("min",[2,4,6]) |
max | Returns the maximum value of elements in an array | 1. inputArray: Array attribute from which to extract the maximum value | nexla_fn.call("max",[2,4,6]) |
avg | Returns the average value of elements in an array | 1. inputArray: Array attribute of which to extract the average value | nexla_fn.call("avg",[2,4,6]) |
bucket | Segments input attribute values into equally sized bins | 1. numberOfBuckets: Size of the bucket or segment (the default size is 10) 2. input: Number or array of numbers to be segmented into bins | nexla_fn.call("bucket",5,[11,21,99]) |
Array Operations
Function | Description | Parameters | Example Call |
---|---|---|---|
join | Joins all elements of an array into a string | 1. delimiter: Character or string separator for joining array elements 2. arrayOfStrings: Input array to be converted to a string | nexla_fn.call("join"," and ",["First","Second"]) |
toStringWithLeadingZeroes | Converts an integer to a string of at least a minimum length | 1. lengthOfOutput: Minimum desired length of the output string 2. integerToConvert: Input number to be converted | nexla_fn.call("toStringWithLeadingZeroes",5,123) |
Location
Function | Description | Parameters | Example Call |
---|---|---|---|
GeoDetection | Extracts location information from a valid IP address string | 1. infoToExtract: Location information to extract ('city', 'continent', 'country', 'country_code', 'dma', 'lattitude', 'longitude', 'postal', 'region', 'region_code', or 'timezone') 2. fromIPAddress: Valid IP Address string to be parsed | nexla_fn.call("GeoDetection","country_code","73.223.52.74") |
User Agent
Function | Description | Parameters | Example Call |
---|---|---|---|
UserAgentDetection | Extracts device information from a valid User Agent string | 1. infoToExtract: Device information to extract ('browser', 'browser_version', 'device_make', 'device_type', or 'operating_system') 2. fromUserAgent: Valid User Agent string to be parsed | nexla_fn.call("UserAgentDetection", "device_make","Mozilla/5.0(iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15(KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1") |
Automotive
Function | Description | Parameters | Example Call |
---|---|---|---|
VINDetection | Extracts vehicle information from a valid VIN string | 1. infoToExtract: Vehicle information to extract ('country', 'manufacturer', or 'year') 2. fromVIN: Valid VIN string to be parsed | nexla_fn.call("VINDetection","year","1GNDM19X35B110457") |
Artificial Intelligence
Function | Description | Parameters | Example Call |
---|---|---|---|
oneHotEncoding | Performs one-hot encoding of an input attribute to produce an object | 1. lookupId: Nexla lookup based on which to perform encoding 2. attrToEncode: Attribute that, if present as a key in the lookup, will be encoded to 1 | nexla_fn.call("integerHash",1234,\<attr>) |
String Operations
Function | Description | Parameters | Example Call |
---|---|---|---|
grok | Parses input unstructured log data into an object with structured fields using a selected or user-defined valid grok pattern | 1. grokPattern: Grok pattern to extract 2. inputString: Input string to be parsed (usually text streamed from a standard log file, i.e., server logs) | nexla_fn.call("grok","%{SYSLOGBASE}","Jan 1 06:25:43 mailserver14 postfix/cleanup[21403]: BEF25A72965: message-id=\20130101142543.5828399CCAF@mailserver14.example.com\") |
regexFind | Returns an array of all matches of the regular expression in the input string | 1. regexPatternBase64: Base 64-encoded regular expression pattern against which to test the input string 2. inputString: Input string or text to test against the regular expression pattern | nexla_fn.call("regexMatch","W2EtekEtWl1bMS05XQ==","abcd1234e5") |
regexMatch | Returns boolean true or false depending on whether or not the input string matches provided regular expression | 1. regexPatternBase64: Base 64-encoded regular expression pattern against which to test the input string 2. inputString: Input string or text to test against the regular expression pattern | nexla_fn.call("regexMatch",XihbYS16QS1aMC05X1wtXC5dKylAKFthLXpBLVowLTlfXC1cLl0rKVwuKFthLXpBLVpdezIsNX0pJA==","test@acme.com") |
toJsonString | Converts a JSON object/array to a JSON string | 1. objectToStringify: Input object/attribute to be flattened into a JSON string | nexla_fn.call("toJsonString",{"one":1,"two":2}) |
6. Examples Using Pre-Built Nexla Functions
This section provides some examples of using Nexla's predefined functions in custom transforms. The custom code for each transform example is shown in both Python and JavaScript.
6.1 Transform an Attribute Using the Nexla Hashing Function
This is a simple example of defining a custom transform to hash an input attribute using Nexla's predefined hashing function.
Below is an input record from a Nexset:
{
"prod_id": "472465",
"prod_name": "Google Chromecast",
"price_per_unit": "35",
"quantity": "126"
}The custom transform in this example should generate an output record based on the following rules:
Rule 1: Create a new attribute, "hash_prod_name", that is an MD5-hashed representation of the value of the input attribute "prod_name"
Rule 2: Pass through the "prod_id" and "prod_name" attributes without changes
Rule 3: Omit the "quantity" attribute
A transform following these rules should produce the following output from the input shown for this example:
{
"hash_prod_name": "4fdf0a2701a1105890efbb75e2c7d0b7",
"prod_id": "472465",
"prod_name": "Google Chromecast"
}Typically, the Nexset Designer is recommended for creating this type of transform, as transforms constructed in this way are easy to maintain and understand. However, the following code snippets can also be used to obtain the same result: f
Python
def transform(input, metadata):
output = {}
output["prod_id"] = input.get("prod_id")
output["prod_name"] = input.get("prod_name")
output["hash_prod_name"] = nexla_fn.call('stringHash','md5',input.get("prod_id"))
return outputJavaScript
function transform(input, metadata) {
var output = {};
output.prod_id = input.prod_id;
output.prod_name = input.prod_name;
output.hash_prod_name= nexla_fn.call('stringHash','md5',input.prod_name);
return output;
}
6.2 Derive an Attribute from a Lookup Table
One of the most powerful Nexla transforms is the ability to look up a value from a previously created lookup table. This example demonstrates how to call Nexla lookups from a lookup table in a custom transform.
For more information about creating and transforming with static and dynamic lookups in Nexla, see the Help Center articles Create a Static Lookup, Create a Dynamic Lookup, and Transforming with Data Lookups.
Consider a dynamic lookup (id: 1630) that contains mapping between IDs ("id") and names ("name"), with "id" as the primary key.
Sample Lookup: (ID - 1630) id 135234 Apple Earpods 472465 Google Chromecast Below is an input record from a Nexset to which this lookup could be applied:
{
"prod_id": "472465",
"manufacturer": "Google",
"price_per_unit": "35",
"quantity": "126"
}The custom transform in this example should generate an output record based on the following rules:
Rule 1: Pass the "prod_id" and "manufacturer" attributes without changes.
Rule 2: Create the new attribute "prod_name", which should be equal to the value of the
"name" attribute in the lookup table row in which the "id" value is equal to the value of the
"prod_id" attribute in the input record.Rule 3: Omit the "price_per_unit" and "quantity" attributes.
A transform following these rules should produce the following output from the input shown for this example:
{
"prod_id": "472465",
"manufacturer": "Google",
"prod_name": "Google Chromecast"
}
The following code snippets can be used to obtain the result shown above:
Python
def transform(input, metadata):
output = {}
output["prod_id"] = input.get("prod_id")
output["manufacturer"] = input.get("manufacturer")
output["prod_name"] = nexla_fn.call('toMapValue2',1630,input.get("prod_id"),'name')
return outputJavaScript
function transform(input, metadata) {
var output = {};
output.prod_id = input.prod_id;
output.manufacturer = input.manufacturer;
output.prod_name = nexla_fn.call('toMapValue2', 1630, input.prod_id,'name');
returnoutput;
}