Skip to main content

Celonis Product Documentation

Extractor Builder

The Extractor Builder is a low-code development tool for data connectivity. It allows you to build your own Extractor using a guided interface. Once you’ve created your Extractor with a few clicks, users in your Data Pool will be able to use it as any other native Celonis Extractor, leveraging features including table/column configuration, pseudonymization, or custom filtering.

Besides creating a new extractor you have the option to customize one of the existing extractors (for example: Bamboo, Ironclad, Happyfox, Jira, Greenhouse, ...) and easily export/import any Extractor that you’ve built with the Extractor Builder from/into different Data Pools or even Celonis team.

The e Extractor Builder is supporting GET requests to all REST APIs that return a JSON or XML response. As a team admin, you can find the Extractor Builder as a new menu option in Data Integration.

Features and Requirements

Features:

  • All components of native Extractor available (table/column selection, filtering, data type conversion, pseudonymization etc.)

  • Sample response based on API call to source system

  • Define request parameters and headers for filtering

  • Pagination

  • Various Error handling rules

  • Support for dependent endpoints

  • Export/import functionality

  • Customize existing Extractors (Happyfox, Bamboo)

Requirements:

  • Celonis Data Pool Admin Role

  • REST API only

  • Response in JSON or XML format

  • Only GET Requests

  • Authentication using Basic, Bearer, API Key or OAuth2

41195364.png
Step by Step Guide
Step 1: Create a new Extractor

As a Team or Data Pool Admin you can access the Extractor Builder under System Administration in Data Integration. After selecting the Extractor Builder tile, you have the options to create a new Extractor, customize an existing (pre-built*) Extractor, as well as export/import an Extractor from another Celonis team or Data Pool.

* Those were pre-built and shared by customers or the Celonis team. The number of Extractors available for customization will constantly increase.

41195368.png
Step 2: Define Extractor Information

After creating a new Extractor, you can provide its name (and optionally a description).

Step 3: Authentication Method

Now, you have to select your source system’s authentication method which can usually be found in the system’s API documentation. You will find a short description incl. required input fields for each authentication method. Depending on your selection here, input fields in the data connection configuration through which users will be able to configure a data connection based on your Extractor Builder will adjust accordingly.

If you cannot find the pagination method you are looking for, please contact Celonis ServiceDesk.

41195369.png
Step 4: Connection Parameters

In addition to authentication method-specific parameters, you can define additional parameters that will be displayed as user input fields in the data connection configuration. These parameters can also be accessed in the next steps of the Extractor Builder configuration - such as in the endpoint configuration’s request URL, request parameter or header definition - via their respective placeholder. (Example: parameter for your system’s API version)

By default, the {Connection.API_URL} parameter will be created as a mandatory parameter and usually contains the host. Even though this is a default parameter, a default value can be determined.

41195370.png

You also have the option to define a default value for the parameter and configure whether the parameter will be stored as a secret such that it will be displayed as a password input (Is Confidential) or whether the parameter is mandatory.

41195371.png

Step 5: Define Endpoints

The final step is to define the API endpoints to fetch relevant data.

41195374.png
  1. Configure Endpoint

    Define a name for your endpoint. This name is only used to differentiate between endpoints and does not have any functional impact.

    41195377.png
  2. Configure Request

    The request URL defines the API endpoint that is called and can be identified in your source system’s API documentation. It always starts with the connection parameter {Connection.API_URL} (s. previous section).

    An example for the tickets endpoint of the Zendesk API is shown below: {Connection.API_URL}/api/v2/tickets.jsonIt is based on the API documentation for this endpoint: https://developer.zendesk.com/rest_api/docs/support/tickets

    41195376.png

    a. Add Request Parameter

    Add request parameters to, e.g., apply filters to your API requests, such as the last creation or updated date filters. Available request parameters can be found in the source system’s API documentation, as well. Users of your Extractor Builder can provide parameter input in the extraction configuration.

    Sticking with the Zendesk API ticket endpoint example from the last step, a common parameter use case here would be using the updated_at parameter (date format) to filter during delta loads.

    41195378.png

    b. Add Request Header:

    This is done via defining a key-value pair. (Example: define Accept as key and application/json as value for the API response to be returned as json)

    41195379.png

    The connection parameters that have been defined in Step 4 can also be leveraged here for both request parameters and request headers.

    c. Choose Pagination Method

    When extracting large data volumes via an API, its response usually does not return all values at once. Instead, it returns multiple pages with 100 records per page, e.g. To fetch data from all pages when extracting via the Extractor Builder, you only have to select the pagination mechanism which is used by your source system’s API and usually included in its documentation.

    For the Zendesk example, information about the relevant pagination mechanism can be found here: https://developer.zendesk.com/rest_api/docs/support/introduction#pagination

    If you cannot find the pagination method you are looking for, please contact Celonis ServiceDesk.

  3. Configure Response After configuring the API request, you can also configure its response. First, define the name of the target table into which the response data will be written in your Celonis Data Pool (example “tickets” in screenshot). To define the response structure and content, you have two options:

    - Copy and paste a JSON response example from the API documentation

    - Sample a response directly from your source system (requirement: a data connection to the source system is already configured)

    41195380.png

    Based on the JSON response, the Extractor Builder automatically creates a table structure including all columns, data types and nested objects.

    If your JSON response has multiple roots, you have the option to specify which root you want to extract via the dropdown.

    41195381.png

    Further, you can adjust column types/formats as required and select primary keys (will be used as default primary keys in the extraction configuration) in the table configuration.

    You can easily delete elements from the created table structure by deleting the respective key value pair from the JSON response. Elements you remove here will not be extracted. (Example: remove raw_subject from extraction by deleting the corresponding key value pair from the response as shown in the screenshot below.)

    41195382.png

    Note that if you define a primary key for a parent table, the primary key will automatically be created as a foreign key column in its nested tables if they exist. (Example: in the screenshot, the tickets_id column in the nested table is automatically created after selecting the id column as parent table (tickets) primary key.

    41195383.png

    Based on the JSON response nested tables are created automatically. These are indicated by square brackets in the response.

    An example for that in the shown response is e.g.

    "tags": [ "Enterprise", "Other_tag" ]Based on this nested json response, the table tickets$tags will be automatically created:

    41195401.png

    If you don’t want these dependent tables to be extracted you can also delete the respective parts from the json response.

    That’s it! Clicking Finish to save your current configuration. Of course, you can always go back and adjust it.

Add an additional endpoint

You can flexibly add additional endpoints to your Extractor Builder. Often, the easiest way is to duplicate an existing endpoint and adjust its configuration.

Add a dependent endpoint

Dependent endpoints are using another endpoint’s response element as input for their request. For example: extracting audit logs for each Zendesk ticket via the tickets_audits endpoint (dependent endpoint) based on the ticket_id returned by the tickets endpoint. From the API documentation, you would see that the request structure for ticket_audits would look like this: GET /api/v2/tickets/{ticket_id}/audits (you would query the ticket audits by iterating over every extracted ticket_id). Let’s set this up:

41195386.png

To add dependent endpoints, you follow the same steps as for creating normal endpoints with the only difference being that you have to define a dependency. You can define dependencies in your request URL (as in the example above), as well as in a request parameter. In both cases, you would use the dependency parameter {Dependency.id} and add it either to the endpoint URL (in this example, the URL would then be {Connection.API_URL}/v2/tickets/{Dependency.id}/audits) or as a parameter.

In the Zendesk example, we would configure the tickets table’s id column as dependency ID, which automatically creates the {Dependency.id} parameter. Now, the dependent endpoint will be requested for every previously extracted, unique dependency ID value

41195387.png
41195388.png

The remaining configuration steps are identical to those of other endpoints.

Error Handling

By default, Extractor Builder extractions will fail if an API request’s response status is not 200 (OK). Error handling rules provide the option to continue the extraction besides non-200 response status. They can be based on the HTTP status/response body, or on a response field.

In the Zendesk dependent endpoint example from the previous section, it could happen, e.g., that a ticket does not have an audit log yet. Then, querying the dependent endpoint would return response status 404, indicating that no audits have been found for a ticket id. To resume the extraction in those cases and continue extracting the audit logs for the next ticket, you can configure an error handling rule as in the below screenshot.

41195390.png