Skip to main content

Github Data Source

Follow the instructions below to create a new data flow that ingests data from a GitHub source in Nexla.
github_api.png

Github

Create a New Data Flow

  1. To create a new data flow, navigate to the Integrate section, and click the New Data Flow button. Then, select the desired flow type from the list, and click the Create button.

  2. Select the GitHub connector tile from the list of available connectors. Then, select the credential that will be used to connect to the GitHub instance, and click Next; or, create a new GitHub credential for use in this flow.

  3. In Nexla, GitHub data sources can be created using pre-built endpoint templates, which expedite source setup for common GitHub endpoints. Each template is designed specifically for the corresponding GitHub endpoint, making source configuration easy and efficient.
    • To configure this source using a template, follow the instructions in Configure Using a Template.

    GitHub sources can also be configured manually, allowing you to ingest data from GitHub endpoints not included in the pre-built templates or apply further customizations to exactly suit your needs.
    • To configure this source manually, follow the instructions in Configure Manually.

Configure Using a Template

Nexla provides pre-built templates that can be used to rapidly configure data sources to ingest data from common GitHub endpoints. Each template is designed specifically for the corresponding GitHub endpoint, making data source setup easy and efficient.

Endpoint Settings

  • Select the endpoint from which this source will fetch data from the Endpoint pulldown menu. Available endpoint templates are listed in the expandable boxes below. Click on an endpoint to see more information about it and how to configure your data source for this endpoint.

    Get All Pull Requests

    Use this endpoint to fetch a list of all pull requests reports for a repository.

    • Enter the account owner of the repository in the Repository Owner field. This should be the GitHub username or organization name that owns the repository. The name is not case sensitive.

    • Enter the name of the repository in the Repository field. This should be the repository name. The name is not case sensitive.

    • The endpoint uses GET requests to https://api.github.com/repos/{owner}/{repo}/pulls where {owner} is the Repository Owner and {repo} is the Repository you provide. The endpoint URL is automatically constructed based on the GitHub API base URL, the repository owner, and the repository name.
    • The endpoint uses incrementing pagination, automatically fetching additional pages as needed. Pagination starts from page 1 and increments by 1 for each subsequent page. Nexla will continue fetching pages until all pull requests are retrieved.
    • The endpoint will return all pull requests for the specified repository. The response data is extracted from the root-level array in the API response ($[*]), with each pull request processed individually.

    Repository owner and repository names are not case sensitive. The endpoint uses incrementing pagination (iteration.type: paging.incrementing) starting from page 1. The response data path is $[*], which extracts all items from the root-level array in the API response. For detailed information about listing pull requests, see the GitHub API documentation.

    Get Reviews of Pull Request

    Use this endpoint to fetch a list of all reviews for a pull request.

    • Enter the account owner of the repository in the Repository Owner field. This should be the GitHub username or organization name that owns the repository. The name is not case sensitive.

    • Enter the name of the repository in the Repository field. This should be the repository name. The name is not case sensitive.

    • Enter the number that identifies the pull request in the PR Number field. This should be the pull request number (not the pull request ID).

    • The endpoint uses GET requests to https://api.github.com/repos/{owner}/{repo}/pulls/{pr}/reviews where {owner} is the Repository Owner, {repo} is the Repository, and {pr} is the PR Number you provide. The endpoint URL is automatically constructed based on the GitHub API base URL, the repository owner, the repository name, and the pull request number.
    • The endpoint uses incrementing pagination, automatically fetching additional pages as needed. Pagination starts from page 1 and increments by 1 for each subsequent page. Nexla will continue fetching pages until all reviews are retrieved.
    • The endpoint will return all reviews for the specified pull request. The response data is extracted from the root-level array in the API response ($[*]), with each review processed individually.

    Repository owner and repository names are not case sensitive. The PR number should be the pull request number (not the pull request ID). The endpoint uses incrementing pagination (iteration.type: paging.incrementing) starting from page 1. The response data path is $[*], which extracts all items from the root-level array in the API response. For detailed information about listing pull request reviews, see the GitHub API documentation.

    Extract Files with a Specific Extension

    Use this endpoint to extract all files with a specific extension from a repository.

    • Enter the file extension to extract from the repository in the Extension to Pull field. This should be the extension without the starting dot (e.g., md for Markdown files, js for JavaScript files).

    • Enter your repository identifier in the Repository Identifier field. This should be in the format user/repo or org/repo (e.g., octocat/Hello-World).

    • The endpoint uses a multi-step process: first, it searches for files with the specified extension using GitHub's code search API, then it retrieves the file contents for each matching file. The endpoint URL is automatically constructed based on the GitHub API base URL, the extension, and the repository identifier.
    • The endpoint does not use pagination and returns all matching files in a single request.
    • The endpoint will return the contents of all files with the specified extension from the repository. The response data is extracted from the search results and file contents, with each file processed individually.

    File extensions should be entered without the starting dot (e.g., md for Markdown, not .md). Repository identifiers should be in the format user/repo or org/repo. The endpoint uses a multi-step process: first searching for files using GitHub's code search API, then retrieving file contents. The endpoint uses a static URL (iteration.type: static.url) for the search step and body-as-file iteration for retrieving file contents. For detailed information about GitHub code search, see the GitHub API documentation.

Endpoint Testing

Once the selected endpoint template has been configured, Nexla can retrieve a sample of the data that will be fetched according to the current settings. This allows users to verify that the source is configured correctly before saving.

  • To test the current endpoint configuration, click the Test button to the right of the endpoint selection menu. Sample data will be fetched & displayed in the Endpoint Test Result panel on the right.

  • If the sample data is not as expected, review the selected endpoint and associated settings, and make any necessary adjustments. Then, click the Test button again, and check the sample data to ensure that the correct information is displayed.

Configure Manually

GitHub data sources can be manually configured to ingest data from any valid GitHub API endpoint. Manual configuration provides maximum flexibility for accessing endpoints not covered by pre-built templates or when you need custom API configurations.

With manual configuration, you can also create more complex GitHub sources, such as sources that use chained API calls to fetch data from multiple endpoints or sources that require custom authentication headers or request parameters.

API Method

  1. To manually configure this source, select the Advanced tab at the top of the configuration screen.

  2. Select the API method that will be used for calls to the GitHub API from the Method pulldown menu. The most common methods are:

    • GET: For retrieving data from the API
    • POST: For sending data to the API or triggering actions
    • PUT: For updating existing data
    • PATCH: For partial updates to existing data
    • DELETE: For removing data

API Endpoint URL

  1. Enter the URL of the GitHub API endpoint from which this source will fetch data in the Set API URL field. This should be the complete URL including the protocol (https://) and any required path parameters. GitHub API endpoints typically follow the pattern https://api.github.com/{endpoint_path}.

Ensure the API endpoint URL is correct and accessible with your current credentials. You can test the endpoint using the Test button after configuring the URL. The endpoint requires OAuth 2.0 authentication via the Authorization: Bearer {token} header, which is handled automatically by your credential configuration. The endpoint also requires the Accept: application/vnd.github+json header, which is automatically included in requests. For detailed information about GitHub API endpoints and available APIs, see the GitHub API documentation.