Skip to main content

Databricks

Nexla's bi-directional connectors can both send data to and receive data from any data system. This means that once a user has created or gained access to a credential for any data system, building any data flow to ingest data from or send data to a location within that data system requires only a few simple steps.

1. Credentials

This section provides information about and step-by-step instructions for creating a new Databricks credential in Nexla.

1.1 Add a New Databricks Credential

  1. After selecting the data source/destination type, in the Authenticate.png screen, click AddANewCredential.png. This will open the Add New Credential window.

      AddNewCredential.png

  2. Enter a name for the credential in the Credential Name field.

      CredName.png

  3. Optional: Enter a description for the credential in the Credential Description field.

      CredDescription.png

  4. Select how the Databricks authentication information will be entered from the URL Format pulldown menu.

    • JDBC URL Select this option to enter the authentication information as a JDBC URL.
    • HTTP Path Parts Select this option to enter the authentication information as parts that will be combined by Nexla to create the connection string.

      URL_Format.png

    • To use the JDBC URL format, continue to Section 1.2.
    • To use the HTTP Path Parts format, continue to Section 1.3.

1.2 JDBC URL Format

  1. Enter the JDBC URL of the Databricks location in the JDBC URL field.

    The JDBC URL should be in the form of jdbc:spark:/....

      JDBC_URL.png

  2. Continue to Section 1.4.

1.3 HTTP Path Parts Format

  1. Enter the hostname of the Databricks database in the Host field.

    The hostname is typically an IP address or text in the format company.domain.com.

    Do not include the connection protocol.

      Host.png

  2. Enter the cluster port number to which the Databricks source connects in the Port field.

      Port.png

  3. Enter the HTTP path of the Databricks SQL endpoint in the HTTP Path field.

    The HTTP path is typically in the form sql/protocolv1/o/<id>/0916-102516-naves603.

    The HTTP path can be found under the JDBC settings in the Databricks console.

      HTTP_Path.png

  4. Enter the username associated with the Databricks account in the Username field.

      Username.png

  5. Enter the password associated with the Databricks account in the Password field.

      Password.png

  6. Continue to Section 1.4.

1.4 Configure the Databricks Environment Settings

  1. Optional: Enter the name of the Databricks database to which Nexla should connect in the Database Name field.

    In Databricks, the terms "database" and "schema" are used interchangeably. For more information about databases/schema and other data objects in Databricks, see this Databricks article.

      DatabaseName.png

  2. Optional: Enter the name of the Databricks schema to which Nexla should connect in the Schema Name field.

      SchemaName.png

  3. Select the type of cloud environment used by the Databricks instance.

    Typically, the Databricks cloud environment is used, but Nexla also supports connecting to Databricks instances that run in other cloud environments.

      CloudType.png

  4. Section 1.5 provides information about advanced settings available for Google BigQuery credentials along with step-by-step instructions for configuring each setting.

    • To configure any desired additional advanced settings for this credential, continue to Section 1.5, and complete the relevant steps.

    • To create this credential without configuring any advanced settings, continue to Section 1.6.

1.5 Advanced Settings

This section covers optional advanced credential settings. To create the Databricks credential without configuring advanced settings, skip to Section 1.6.

  1. Click AdvSettings.png at the bottom of the Add New Credential window to access additional available settings for the Databricks credential.
  • Access the Databricks database via an SSH Tunnel

    1. If the Databricks database from which data should be read is not publicly accessible, check the box next to RequiresSSH.png. This will append additional related fields to be populated in the Add New Credential window.

      Selecting this option allows Nexla to connect to a bastion host via SSH, and the database connection will then be provided through the SSH host.

        SSH_Fields.png

    2. Enter the SSH tunnel hostname or IP address of the bastion host running the SSH tunnel server that has access to the database in the SSH Tunnel Host field.

        SSH_TunnelHost.png

    3. Enter the port of the tunnel bastion host to which Nexla will connect in the SSH Tunnel Port field.

        SSH_TunnelPort.png

    4. Create an SSH username for Nexla in the bastion host, and enter that username in the Username for Tunnel field.

      Usually, the username is set as "nexla".

        TunnelUsername.png

1.6 Save and Create the Databricks Credential

  1. Once all of the relevant steps in the above sections have been completed, click Save.png at the bottom of the Add New Credential screen to save the credential and all entered information.

      Save2.png

  2. The newly added credential will now appear in a tile on the Authenticate.png screen and can be selected for use with a new data source or destination.

      CredentialsList.png

2. Data Source

To ingest data from a Databricks location, follow the instructions in Section 2 of Common Setup for Databases & Data Warehouses.

3. Data Destination

To send data to a Databricks location, follow the instructions in Section 3 of Common Setup for Databases & Data Warehouses.