Dataset Samples
Both Nexla API and Nexla CLI support methods to fetch a sample set of records from any dataset. The returned samples may come from a live sample of the corresponding dataset, or a cached copy, depending on the status of that dataset.
Furthermore, by modifying the payload request you can choose to fetch different types of information about the sampled data.
Fetch Input and Output Samples
Each Nexla dataset processes data by applying transforms to an input set of records to create corresponding output records. You can choose to fetch input and corresponding output samples of a dataset to view how samples got processed by the dataset. The input item is a sample data object matching the source schema or parent data set output schema. The output item is the same sample after the data set's transforms (if any) have been applied.
- Nexla API
- Nexla CLI
GET /data_sets/{data_set_id}/samples
nexla dataset sample <dataset_id> [options]
Description:
Get the samples for a dataset by id
Usage:
nexla dataset sample <dataset_id> [options]
Options:
-c,--count (int) Number of samples to be displayed, default is 10
-t,--transform (boolean) Transformed output for the samples are displayed if this option is given, This option will be considered defaultly until user does not pass other options.
-i,--show_inputs (boolean) Inputs for the samples are displayed if this option is given.
Examples:
1. nexla dataset sample 19609
Here, since options have been provided, -t/–transform will be considered as the default
2. nexla dataset sample 5081 --count 10 --transform true --show_inputs false
- Nexla API
- Nexla CLI
[
{
"input": {
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
},
"output": {
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
}
},
{
"input": {
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
},
"output": {
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
}
},
{
"input": {
"Date": "2016-12-11",
"ShortVolume": "42.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "42.0"
},
"output": {
"Date": "2016-12-11",
"ShortVolume": "42.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "42.0"
}
}
]
[
{
"input": {
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
},
"output": {
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
}
},
{
"input": {
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
},
"output": {
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
}
},
{
"input": {
"Date": "2016-12-11",
"ShortVolume": "42.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "42.0"
},
"output": {
"Date": "2016-12-11",
"ShortVolume": "42.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "42.0"
}
}
]
Fetch Input and Output Samples
Call methods below to receive only output samples from the data set, i.e. records to which the data set's transforms have been applied.
- Nexla API
- Nexla CLI
GET /data_sets/{data_set_id}/samples?output_only=true
nexla dataset sample 5081 --count 10 --transform true --show_inputs false
Must Have Parameter:
--transform true --show_inputs false
Other Optional Parameters:
-c,--count (int) Number of samples to be displayed, default is 10
--metadata Nexla metadata for the records
- Nexla API
- Nexla CLI
[
{
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
},
{
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
},
{
"Date": "2016-12-11",
"ShortVolume": "42.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "42.0"
}
]
[
{
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
},
{
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
},
{
"Date": "2016-12-11",
"ShortVolume": "42.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "42.0"
}
]
Fetch Samples with Metadata
Nexla associates metadata with each record that flows through it. You can choose to fetch samples with their associated metadata by setting include_metadata=1
request parameter in Nexla API or --metadata
option in Nexla CLI. In this case, each sample will be prepacked with two attributes: rawMessage
that contains the actual record data, and nexlaMetaData
that contains the corresponding metadata for that record.
- Nexla API
- Nexla CLI
GET /data_sets/{data_set_id}/samples?include_metadata=1
...
Must Have Parameter: include_metadata=1
Other Optional Parameters:
count: Total number of samples to fetch
output_only=1: Fetch only output records from a dataset
nexla dataset sample 5081 --count 10 --metadata --transform true --show_inputs
Must Have Parameter:
--metadata
Other Optional Parameters:
-c,--count (int) Number of samples to be displayed, default is 10
-t,--transform (boolean) Transformed output for the samples are displayed if this option is given, This option will be considered default until user not pass other options.
-i,--show_inputs (boolean) Inputs for the samples are displayed if this option is given.
- Nexla API
- Nexla CLI
[
{
"input": {
"rawMessage": {
"product_id": 1234,
"price": 5.0
},
"nexlaMetaData": {
"trackerId": "u9704::test/test_201808.csv:1:1:1:1528366194380;NA",
"sourceType": "S3",
"ingestTime": 1546916322079,
"sourceOffset": 43808,
"sourceKey": "test/test_201808.csv",
"bucket": "test",
"topic": "dataset-5081-source-9704",
"resourceType": "SOURCE",
"resourceId": 9704,
"nexlaUUID": null,
"eof": false,
"runId": 1555400123,
"tags": [{}],
"transformTime": 1555400123895,
"transformTimeISO8601": "2019-04-16T07:35:23.895Z"
},
"error": null
},
"output": {
"rawMessage": {
"product_id": 1234,
"price": 5.0,
"modified_price": 6.0
},
"nexlaMetaData": {
"trackerId": "u9704::test/test_201808.csv:1:1:1:1528366194380;NA",
"sourceType": "S3",
"ingestTime": 1546916322079,
"sourceOffset": 43808,
"sourceKey": "test/test_201808.csv",
"bucket": "test",
"topic": "dataset-5081-source-9704",
"resourceType": "SOURCE",
"resourceId": 9704,
"nexlaUUID": null,
"eof": false,
"runId": 1555400123,
"tags": [{}],
"transformTime": 1555400123904,
"transformTimeISO8601": "2019-04-16T07:35:23.904Z"
},
"error": null
}
}
]
[
{
"input": {
"rawMessage": {
"product_id": 1234,
"price": 5.0
},
"nexlaMetaData": {
"trackerId": "u9704::test/test_201808.csv:1:1:1:1528366194380;NA",
"sourceType": "S3",
"ingestTime": 1546916322079,
"sourceOffset": 43808,
"sourceKey": "test/test_201808.csv",
"bucket": "test",
"topic": "dataset-5081-source-9704",
"resourceType": "SOURCE",
"resourceId": 9704,
"nexlaUUID": null,
"eof": false,
"runId": 1555400123,
"tags": [{}],
"transformTime": 1555400123895,
"transformTimeISO8601": "2019-04-16T07:35:23.895Z"
},
"error": null
},
"output": {
"rawMessage": {
"product_id": 1234,
"price": 5.0,
"modified_price": 6.0
},
"nexlaMetaData": {
"trackerId": "u9704::test/test_201808.csv:1:1:1:1528366194380;NA",
"sourceType": "S3",
"ingestTime": 1546916322079,
"sourceOffset": 43808,
"sourceKey": "test/test_201808.csv",
"bucket": "test",
"topic": "dataset-5081-source-9704",
"resourceType": "SOURCE",
"resourceId": 9704,
"nexlaUUID": null,
"eof": false,
"runId": 1555400123,
"tags": [{}],
"transformTime": 1555400123904,
"transformTimeISO8601": "2019-04-16T07:35:23.904Z"
},
"error": null
}
}
]