Writing Data
Data Destinations
Data destination (also called data sink) resources describe external destinations for output data from specific datasets. Like data sources, they usually require client credentials to allow connecting and writing to the external system. No matter where the data the data has to be written out, all information where, when, and how to write the data out is contained in these Nexla resources.
List All Destinations
Both Nexla API and Nexla CLI support methods to list all destinations in the authenticated user's account. A successful call returns detailed information like id, owner, type, credentials, activation status, and output configuration about all destinations.
- Nexla API
- Nexla CLI
GET /data_sinks
Example:
curl https://api.nexla.io/data_sinks \
-H "Authorization: Bearer <Access-Token>" \
-H "Accept: application/vnd.nexla.api.v1+json"
nexla destination list
- Nexla API
- Nexla CLI
[
{
"id": 5854,
"owner": {
"id": 2,
"full_name": "Jeff Williams"
},
"org": {
"id": 1,
"name": "Nexla",
"email_domain": "nexla.com",
"email": null
},
"access_roles": ["owner"],
"name": "Amazon S3 test",
"description": null,
"status": null,
"data_set_id": 8092,
"data_map_id": null,
"sink_type": "s3",
"sink_format": null,
"sink_config": {
"mapping": {
"mode": "manual",
"mapping": {
"item_id": ["item_id"],
"item_name": ["item_name"],
"store_code": ["store_code"],
"city_code": ["city_code"],
"item_price": ["item_price"],
"discount": ["discount"],
"discounted_price": ["discounted_price"]
},
"fields_order": [
"item_id",
"item_name",
"store_code",
"city_code",
"item_price",
"discount",
"discounted_price"
],
"tracker_mode": "NONE"
},
"data_format": "csv",
"sink_type": "s3",
"path": "customer-solutions.nexla.com/echo/nexla_outputs",
"output.dir.name.pattern": "{yyyy}-{MM}-{dd}/{HH}"
},
"sink_schedule": null,
"managed": false,
"data_set": {
"id": 8092,
"name": "echo"
},
"data_credentials": {
"id": 5216,
...
},
"updated_at": "2019-07-17T11:56:40.000Z",
"created_at": "2019-07-17T11:56:40.000Z",
"tags": []
},
{
"id": 5752,
"owner": {
"id": 2,
"full_name": "Jeff Williams"
},
"org": {
"id": 1,
"name": "Nexla",
"email_domain": "nexla.com",
"email": null
},
"access_roles": ["owner"],
"name": "test",
"description": null,
"status": null,
"data_set_id": 7728,
"data_map_id": null,
"sink_type": "s3",
"sink_format": null,
"sink_config": {
"mapping": {
"mode": "auto",
"tracker_mode": "NONE"
},
"data_format": "json",
"sink_type": "s3",
"path": "customer-solutions.nexla.com/test",
"output.dir.name.pattern": "{yyyy}-{MM}-{dd}"
},
"sink_schedule": null,
"managed": false,
"data_set": {
"id": 7728,
"name": "test"
},
"data_credentials": {
"id": 5216,
...
},
"updated_at": "2019-04-26T06:40:02.000Z",
"created_at": "2019-04-26T06:40:02.000Z",
"tags": []
}
]
id status type name location credentials_name
------ ---------- ------- -------- ----------------------------- ------------------
5223 PAUSED ftp test ftp://test-regression/sink sftp_test
5224 ACTIVE s3 test1 s3://test-nexla.com/sink s3_test
Show One Destination
Fetch a specific destination accessible by the authenticated user. A successful call returns detailed information like id, owner, type, credentials, activation status, and output configuration about that destination.
In case of Nexla API, add an expand
query param with a truthy value to get more details about the destination. With this parameter, full details about the related resources (destination's dataset, credentials, etc) will also be returned.
- Nexla API
- Nexla CLI
GET /data_sinks/{data_sink_id}
Example
curl https://api.nexla.io/data_sinks/5854 \
-H "Authorization: Bearer <Access-Token>" \
-H "Accept: application/vnd.nexla.api.v1+json"
nexla destination get <destination_id>
- Nexla API
- Nexla CLI
{
"id": 5854,
"owner": {
"id": 82,
...
},
"org": {
"id": 1,
"name": "Nexla",
"email_domain": "nexla.com",
"email": null
},
"access_roles": [
"owner"
],
"name": "Amazon S3 test",
"description": null,
"status": null,
"data_set_id": 8092,
"data_map_id": null,
"sink_type": "s3",
"sink_format": null,
"sink_config": {
"mapping": {
"mode": "manual",
"mapping": {
"item_id": [
"item_id"
],
"item_name": [
"item_name"
],
"store_code": [
"store_code"
],
"city_code": [
"city_code"
],
"item_price": [
"item_price"
],
"discount": [
"discount"
],
"discounted_price": [
"discounted_price"
]
},
"fields_order": [
"item_id",
"item_name",
"store_code",
"city_code",
"item_price",
"discount",
"discounted_price"
],
"tracker_mode": "NONE"
},
"data_format": "csv",
"sink_type": "s3",
"path": "customer-solutions.nexla.com/echo/nexla_outputs",
"output.dir.name.pattern": "{yyyy}-{MM}-{dd}/{HH}"
},
"sink_schedule": null,
"managed": false,
"data_set": {
"id": 8092,
"name": "echo"
},
"data_credentials": {
"id": 5216,
...
},
"updated_at": "2019-07-17T11:56:40.000Z",
"created_at": "2019-07-17T11:56:40.000Z",
"tags": []
}
{
"name": "test1",
"data_credentials": "<5055:s3_test>",
"sink_type": "s3",
"data_set_id": 6024,
"sink_config": {
"poll_frequency": "Minute",
"mapping": {
"mode": "manual",
"mapping": {
"name": "name",
"city": "city",
"country": "country"
}
},
"prefix": "test-nexla.com/sink",
"data_format": "csv",
"output.dir.name.pattern": "{yyyy}-{MM}-{dd}",
"bucket": "/"
},
"description": null
}
Create A Destination
You can use Nexla API to create a new data destination in the authenticated user's account. The only required attribute in the input object is the data destination name; all other attributes are set to default values. Specify data_set_id to associate what data should be written into that destination, data_credential to authorize the destination location, and sink_config to control how the data should be written out to that destination.
- Nexla API
POST /data_sinks
Example Request Body
...
{
"name": "Test Destination",
"description": null,
"sink_type": "dropbox",
"sink_config": {
"mapping": {
"mode": "auto",
"tracker_mode": "NONE"
},
"data_format": "json",
"sink_type": "dropbox",
"path": "/nexlatests/dataout/rel22",
"output.dir.name.pattern": "demo/{yyyy}/{MM}/{dd}"
},
"data_credentials": 8342,
"data_set_id": 22194
}
- Nexla API
{
"id": 5855,
"owner": {
"id": 82,
...
},
"org": {
"id": 1,
...
},
"access_roles": [
"owner"
],
"name": "Test Destination",
"description": null,
"status": null,
"data_set_id": 22194,
"data_map_id": null,
"sink_type": "dropbox",
"sink_format": null,
"sink_config": {
"mapping": {
"mode": "auto",
"tracker_mode": "NONE"
},
"data_format": "json",
"sink_type": "dropbox",
"path": "/nexlatests/dataout/rel22",
"output.dir.name.pattern": "demo/{yyyy}/{MM}/{dd}"
},
"sink_schedule": null,
"managed": false,
"data_set": {
"id": 22194,
...
},
"data_credentials": {
"id": 8342,
...
},
"updated_at": "2019-07-17T11:56:40.000Z",
"created_at": "2019-07-17T11:56:40.000Z",
"tags": []
}
Create with Credentials
Data destinations usually require some credentials for making a connection and ingesting data. You can refer to an existing data_credentials resource or create a new one in the create data destinations. In this example, an existing credentials object is used:
- Nexla API
POST /data_sinks
Example Request Body
...
{
"name": "Test Destination",
"description": null,
"sink_type": "dropbox",
"sink_config": {
"mapping": {
"mode": "auto",
"tracker_mode": "NONE"
},
"data_format": "json",
"sink_type": "dropbox",
"path": "/nexlatests/dataout/rel22",
"output.dir.name.pattern": "demo/{yyyy}/{MM}/{dd}"
},
"data_credentials": 8342,
"data_set_id": 22194
}'
Here, the required attributes for creating a new data_credentials resource are included in the request:
- Nexla API
POST /data_sinks
Example Request Body
...
{
"name": "Test Destination",
"description": null,
"sink_type": "dropbox",
"sink_config": {
"mapping": {
"mode": "auto",
"tracker_mode": "NONE"
},
"data_format": "json",
"sink_type": "dropbox",
"path": "/nexlatests/dataout/rel22",
"output.dir.name.pattern": "demo/{yyyy}/{MM}/{dd}"
},
"data_set_id": 22194,
"data_credentials": {
"name": "FTP CREDS",
"credentials_type": "ftp",
"credentials_version": "1",
"credentials": {
"credentials_type": "ftp",
"account_id": "XYZ",
"password": "123"
}
}
}'
In either case, a successful POST
on /data_sinks
with credential information will return a response including the full data destination and the encrypted form of its associated data credentials resource:
- Nexla API
{
"id": 5855,
"owner": {
"id": 82,
...
},
"org": {
"id": 1,
...
},
"access_roles": [
"owner"
],
"name": "Test Destination",
"description": null,
"status": null,
"data_set_id": 22194,
"data_map_id": null,
"sink_type": "dropbox",
"sink_format": null,
"sink_config": {
"mapping": {
"mode": "auto",
"tracker_mode": "NONE"
},
"data_format": "json",
"sink_type": "dropbox",
"path": "/nexlatests/dataout/rel22",
"output.dir.name.pattern": "demo/{yyyy}/{MM}/{dd}"
},
"sink_schedule": null,
"managed": false,
"data_set": {
"id": 22194,
...
},
"data_credentials": {
"id": 8342,
...
},
"updated_at": "2019-07-17T11:56:40.000Z",
"created_at": "2019-07-17T11:56:40.000Z",
"tags": []
}
Update A Destination
Nexla API supports methods to update any property of an existing destination the authenticated user has access to.
- Nexla API
PUT /data_sinks/<data_sink_id>
Example Request Body
...
{
"name": "Updated S3 Data Sink",
}
- Nexla API
{
"id": 5023,
"owner": {
"id": 82,
...
},
"org": {
"id": 1,
"name": "Nexla",
"email_domain": "nexla.com",
"email": null
},
"access_roles": [
"owner"
],
"name": "Updated S3 Data Sink",
"description": null,
"status": null,
"data_set_id": 8092,
"data_map_id": null,
"sink_type": "s3",
"sink_format": null,
"sink_config": {
"mapping": {
"mode": "manual",
"mapping": {
"item_id": [
"item_id"
],
"item_name": [
"item_name"
],
"store_code": [
"store_code"
],
"city_code": [
"city_code"
],
"item_price": [
"item_price"
],
"discount": [
"discount"
],
"discounted_price": [
"discounted_price"
]
},
"fields_order": [
"item_id",
"item_name",
"store_code",
"city_code",
"item_price",
"discount",
"discounted_price"
],
"tracker_mode": "NONE"
},
"data_format": "csv",
"sink_type": "s3",
"path": "customer-solutions.nexla.com/echo/nexla_outputs",
"output.dir.name.pattern": "{yyyy}-{MM}-{dd}/{HH}"
},
"sink_schedule": null,
"managed": false,
"data_set": {
"id": 8092,
"name": "echo"
},
"data_credentials": {
"id": 5216,
...
},
"updated_at": "2019-07-17T11:56:40.000Z",
"created_at": "2019-07-17T11:56:40.000Z",
"tags": []
}
Delete A Destination
Nexla API supports methods to delete any destination that the authenticated user has administrative/ownership rights to. A successful request to delete a data destination returns Ok (200) with no response body.
- Nexla API
DELETE /data_sinks/{data_sink_id}
- Nexla API
Empty response with status 200 for success
Error response with reason if destination could not be deleted
Control Data Output
Activate and Pause Destination
Associate a data set with a destination to control what data will be written out to a destination. Each destination can only have one dataset that writes data to it. You can associate a dataset with a destination by setting the data_set_id property of the destination.
You can control data from being written out by one of two ways:
- You can control the status of the associated dataset (see relevant methods in the dataset page). This will prevent dataset from even processing data to be written out.
- You can use the methods below to activate or pause the destination. This method is better suited for scenarios where same dataset is configured for writing to multiple destinations.
- Nexla API
- Nexla CLI
PUT /data_sinks/{data_sink_id}/activate
nexla destination activate <destination_id>
On the flip side, call the pause method to immediately stop data write on that destination.
- Nexla API
- Nexla CLI
PUT /data_sinks/{data_sink_id}/pause
nexla destination pause <destination_id>
Validate Destination Configuration
All configuration about where and when to scan data is contained with the sink_config
property of a data destination.
As Nexla provides quite a few options to fine tune and control exactly how and where you want to write your data out, it is important to ensure the sink_config
contains all required parameters to successfully scan data. To validate the configuration of a given data destination, send a POST
request on endpoint /data_sinks/<data_sink_id>/config/validate
.
You can send optional json config as input body, if there is no input config in request then stored sink_config will be used for validation.
- Nexla API
POST /data_sinks/{data_sink_id}/config/validate
- Nexla API
{
"status": "ok",
"output": [
{
"name": "credsEnc",
"value": null,
"errors": [
"Missing required configuration \"credsEnc\" which has no default value."
],
"visible": true,
"recommendedValues": []
},
{
"name": "credsEncIv",
"value": null,
"errors": [
"Missing required configuration \"credsEncIv\" which has no default value."
],
"visible": true,
"recommendedValues": []
},
{
"name": "sink_typ",
"value": null,
"errors": [
"Missing required configuration \"sink_type\" which has no default value.",
"Invalid value null for configuration sink_type: Invalid enumerator"
],
"visible": true,
"recommendedValues": []
}
]
}
Monitor Destination
Use the methods listed in this section to monitor all data write history for a destination.
Lifetime Write Metrics
Lifetime write metrics methods return information about total data written out through a destination since its creation. Metrics contain information about the number of records written out as well the estimated volume of data.
- Nexla API
GET /data_sinks/5001/metrics
- Nexla API
{
"status": 200,
"metrics": {
"records": 4,
"size": 582
}
}
Aggregated Write Metrics
Aggregated write metrics methods return information about total data written out every day from a destination. Metrics contain information about the number of records written out as well the estimated volume of data.
Aggregations can be fetched in different aggregation units. Use the method below to fetch reports aggregated daily:
- Nexla API
- Nexla CLI
GET /data_sink/5001/metrics?aggregate=1
...
Optional Query Parameters:
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"page": <integer page number>,
"size": <number of entries in page>
nexla destination metrics 8874
...
Optional Payload Parameters
-d,--days (int) Number of days ago to get the metrics, default is 7
-s,--start (string) UTC datetime in '%Y-%m-%dT%H:%M:%S' format
-e,--end (string) UTC datetime in '%Y-%m-%dT%H:%M:%S' format, default is current time. -s/--start required for this option.
- Nexla API
- Nexla CLI
{
"status": 200,
"metrics": [
{
"time": "2017-02-08",
"record": 53054,
"size": 12476341
},
{
"time": "2017-02-09",
"record": 66618,
"size": 15829589
},
{
"time": "2017-02-10",
"record": 25832,
"size": 6645994
}
]
}
Date (UTC) Records Volume (Bytes) Errors
------------ --------- ---------------- --------
2019-04-25 81577 410348843 0
2019-04-26 97350 460260701 0
2019-04-27 85675 392488855 0
2019-04-28 85646 391447623 0
Destination metrics can also be batched by the ingestion frequency of the originating data. Use the methods below to view destination metrics per ingestion cycle.
- Nexla API
- Nexla CLI
GET /data_sinks/5001/metrics/run_summary
...
Optional Query Parameters:
"runId": <starting from unix epoch time of ingestion events>,
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"page": <integer page number>,
"size": <number of entries in page>
nexla destination metrics 6864
...
Optional Payload Parameters
-d,--days (int) Number of days ago to get the metrics, default is 7
-s,--start (string) UTC datetime in '%Y-%m-%dT%H:%M:%S' format
-e,--end (string) UTC datetime in '%Y-%m-%dT%H:%M:%S' format, default is current time. -s/--start required for this option.
- Nexla API
- Nexla CLI
{
"status": 200,
"metrics": {
"1539970426049": {
"records": 1364,
"size": 971330,
"errors": 0
},
"1539990426049": {
"records": 330,
"size": 235029,
"errors": 0
}
}
}
Date (UTC) Records Volume (Bytes) Errors
------------ --------- ---------------- --------
2020-04-21 9 12598 0
Granular Write Status Metrics
Apart from aggregated write metrics methods above that provide visibility into total number of records and total volume of data written out over a period of time, Nexla also provides methods to view granular details about data write events.
You can retrieve data write status of a file destination to find information like how many files have been written out fully, or are queued for being written out.
- Nexla API
GET /data_sinks/6745/metrics/files_stats
...
Optional Parameters
{
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"status": "one of NOT_STARTED/IN_PROGRESS/COMPLETE/ERROR/PARTIAL"
}
- Nexla API
{
"status": 200,
"metrics": {
"data": {
"COMPLETE": 17
},
"meta": {
"currentPage": 1,
"totalCount": 1,
"pageCount": 1
}
}
}
You can view write status and history per file of a file destination. The file destination write history methods below return one entry per file by aggregating all write events for each file.
- Nexla API
- Nexla CLI
/data_sinks/<data_sink_id>/metrics/files
...
Optional Parameters
{
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"status": "one of NOT_STARTED/IN_PROGRESS/COMPLETE/ERROR/PARTIAL",
"page": <integer page number>,
"size": <number of entries in page>
}
nexla destination write-stats <destination_id> [options]
...
options
-d,--days (int) Number of days ago to get the metrics, default is 7
-s,--start (string) UTC datetime in '%Y-%m-%dT%H:%M:%S' format
-e,--end (string) UTC datetime in '%Y-%m-%dT%H:%M:%S' format, default is current time. -s/--start required for this option.
- Nexla API
- Nexla CLI
{
"status": 200,
"metrics": {
"data": [
{
"dataSetId": 11429,
"size": 7750996,
"writeStatus": "COMPLETE",
"sinkId": 6745,
"recordCount": 285,
"name": "/nexlatests/dataout/rel22/anyof/1/dataset-11429-000000000000.json",
"id": null,
"lastWritten": "2019-08-16T20:57:49Z",
"runId": 1565912396852,
"error": null
}
],
"meta": {
"currentPage": 1,
"totalCount": 1,
"pageCount": 1
}
}
}
File Name Size Records Status Dataset ID Last Written (UTC)
-------------------- ------- --------- -------- ------------ ---------------------
/files/demo-1.csv 501689 792 COMPLETE 7751 2019-05-03T21:57:25Z
/files/demo-2.csv 2789267 4383 COMPLETE 7751 2019-05-03T21:57:22Z
You can also bypass per file aggregation and fetch full ingestion history of each file even if it was written out multiple times. This is relevant for scenarios where the destination has been configured to write out the same file name in every ingestion cycle
- Nexla API
GET /data_sinks/<data_sink_id>/metrics/files_raw
...
Optional Parameters
{
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"status": "one of NOT_STARTED/IN_PROGRESS/COMPLETE/ERROR/PARTIAL",
"page": <integer page number>,
"size": <number of entries in page>
}
- Nexla API
{
"status": 200,
"metrics": [
{
"dataSetId": 11429,
"size": 7750996,
"writeStatus": "COMPLETE",
"sinkId": 6745,
"recordCount": 285,
"name": "/nexlatests/dataout/rel22/anyof/1/dataset-11429.json",
"id": null,
"lastWritten": "2019-08-16T20:57:49Z",
"runId": 1565912396852,
"error": null
},
{
"dataSetId": 11429,
"size": 7750996,
"writeStatus": "COMPLETE",
"sinkId": 6745,
"recordCount": 285,
"name": "/nexlatests/dataout/rel22/anyof/1/dataset-11429.json",
"id": null,
"lastWritten": "2019-08-15T20:57:49Z",
"runId": 1565912396852,
"error": null
}
]
}
Other Monitoring Events
See the section on Monitoring resources for method to view destination errors, notifications, quarantine samples, and audit logs.