General Overview
Supported Database Types
The database or JDBC connector allows you to connect to any SQL database via JDBC. Currently, the following databases are supported directly. You can also connect to any other database by supplying the JDBC driver.
For legal reasons we cannot provide JDBC drivers for certain database types when you use them for an on-premise Extractor. There is no out-of the box support for these systems when you use these database types as an on-premise Extractor.
This table summarizes driver availability.
"Yes" indicates drivers that are supported out of the box. No separate driver deployment is needed for any connection scenario.
"No" indicates that there is no out-of-the-box support. You need to deploy a separate driver.
Database types | Direct | On-premise |
---|---|---|
Amazon Athena | Yes | Yes |
Amazon Redshift | Yes | No |
Azure SQL | Yes | Yes |
Azure Synapse | Yes | Yes |
Cloudera Impala | Yes | No |
Google BigQuery | Yes | No |
HANA (encrypted or unencrypted) | Yes | No |
Hive | Yes | No |
IBM DB2 | Yes | Yes |
Intersystems Cache | Yes | Yes |
MSSQL | Yes | Yes |
MySQL | Yes | No |
Netezza | Yes | Yes |
OpenEdge | Yes | Yes |
Oracle | Yes | Yes |
Postgres (encrypted or unencrypted) | Yes | Yes |
SAP MaxDB | Yes | No |
Snowflake | Yes | Yes |
Sybase | Yes | No |
Teradata | Yes | No |
Trino | Yes | Yes |
You need to specify the driver when running the Extractor via the command line like this:
java -Dloader.path=<path_to_driver> -jar <connector_file_name>.jar
And when running the extractor as a service you need to change the arguments line in the CelonisJDBCExtractor.xml file as follows:
<arguments>-Djava.io.tmpdir="%BASE%\temp"-Dloader.path=<path_to_driver> -jar connector-jdbc.jar</arguments>
FAQ
Can I extract data via ODBC interface?ODBC is open database connectivity, used between applications. JDBC is Java database connectivity, used by Java developers to connect to databases. Basically, the JDBC extractor uses ODBC via a Java program. Using a JDBC-ODBC Bridge program, ODBC-accessible databases are accessible via JDBC interface.
Connection Options
There are two scenarios:
A) You do not want to or cannot allow the EMS to access your database directly and you want to use an on premise Extractor instead.
B) You want to allow the EMS to access your database directly.
On premise Extractor (via Uplink)
In case you don't want to or cannot allow the EMS to access your database directly, we have developed a dedicated on premise Extractor to ensure a continuous data pipeline between your source database and the EMS.
After the pipeline components are set up and the connection is established, users can schedule extraction jobs and ingest the data from their database tables into the EMS.
These are the components involved in this flow:
Celonis EMS: This is where the user defines the extraction job, i.e. which tables should be extracted, what kind of filters should be applied, extraction schedule, etc...
Extractor: a middle-men between the EMS and the database. Its role is to poll and fetch the Job requests from Celonis, and submit the execution information to the source database via a SQL query. Once the data is retrieved from the database, the Extractor fetches it and sends it back to the EMS. The Extractor is installed in the customer network on a dedicated server.
The connection between the extractor and the EMS is always made by the extractor. Although an extraction appears to be triggered from the Data Integration, the Extractor continuously queries the Data Integration for extractions to be carried out.
The extractor server connects to the EMS for data extraction (full load or delta load) HTTPS encrypted via TLS 1.2, port 443.
The table, attribute, pseudonymization and filter requirements defined in the EMS Data Integration are retrieved by the extractor server
The extractor server establishes the connection to the Database via JDBC, respective database port and requests the defined data retrieved from the EMS (pull)
The JDBC API executes the SQL statements (read-only) via a database user and pulls the tables requested by the EMS in the form of a java object
The extractor server receives and processes this object by creating a .parquet file and transfers it encrypted to the EMS via the connection created in step 1
The diagram below shows the components involved and their interaction, and the next sections explain each component in more detail.

Extraction Flow
![]() |
Security 101
What protocol is used for the communication between the database and the EMS?
The extractor server connects to the EMS for data extraction (full load or delta load) HTTPS encrypted via TLS 1.2, port 443.
For which databases is encryption in-transit enabled by default and for which does it need to be enabled via additional parameters?The in-transit encryption is in general handled by the JDBC driver, so this normally is specified in the JDBC driver documentation. For example for HANA or Postgres there are separate connection templates (e.g. HANA Encrypted) which enforce the encryption by using the respective driver automatically.
For most databases, this can also be activated by adding an additional JDBC connection parameter (mostly encrypt=true) into the additional properties of the data connection.
For some database types (such as Snowflake) the in-transit encryption is automatically enforced by the database server and doesn't need to be configured.
Which protocol is used for encryption?
TLS 1.2
How is pseudonymization handled?
Pseudonymization is handled by the extractor before the data is written to parquet files and inserted into the EMS. So for direct database connections this happens at the moment the files are sent to the cloud and for uplinked connection this happens on-premise on the extractor server.
How and where are the username and password stored?
Username and password (and everything you see in the data connection form for any extractors) is:
converted to a byte array (
{username: 'celonis'}
becomes7b 75 73 65 72 6e 61 6d 65 3a 20 27 66 6c 6f 72 69 61 6e 27 7d
encrypted with a tenant specific id