Skip to main content

Celonis Product Documentation

Executing delta data model loads

Limited availability

This functionality is currently in limited availability. If you’re interested in trying it out, get in touch with us through celopeers.com/s/support.

Delta data model loads significantly reduce the overall latency of your data model loads, allowing you to only extract and transform changes to your data (rather than the full data set each time).

Before you can execute delta data model loads, you need to meet the following prerequisites:

Delta transformations allow you to identify which rows changed in a data model table since the last successful data model load, so these must be configured for your data jobs.

For more information, see: Delta transformations in data jobs.

The final tables used for the process data model are often based on views. Views in Vertica do not have the epoch timestamp column included automatically (like tables). For the delta data model load to work, all view-based tables in the data model need to have an epoch column. This epoch column needs to be added manually by adapting the SQL scripts in the workbench.

Epoch represents a logical timestamp for the data in Vertica, which is implicitly stored for every row in a table. The epoch advances when the data is committed with a DML operation (Insert, Update, Merge, Copy, Delete). This approach allows for efficient detection of changed rows in Vertica.

For view definitions, there are three cases:

Simple views

For a simple view, the epoch can be projected as follows in SQL (shortened example):

DROP VIEW IF EXISTS P2P_MARC;

CREATE VIEW P2P_MARC AS(
SELECT 
     MARC.*,
     epoch 
FROM 
     MARC AS MARC
...);
UNION ALL views

For views based on a UNION ALL of multiple tables, the epoch is projected as follows in SQL:

DROP VIEW IF EXISTS BKPF_UNION;

CREATE VIEW BKPF_UNION AS(
SELECT *, epoch from BKPF_BSIK
UNION ALL
SELECT *, epoch from BKPF_BSAK
);
JOIN views

For views based on JOINS the minimum epoch needs to be projected from the underlying base tables as follows in SQL:

DROP VIEW IF EXISTS P2P_EKPO;

CREATE VIEW P2P_EKPO AS(
SELECT 
      ...
      EKPO.epoch
   FROM
      EKPO AS EKPO
      JOIN P2P_EKPO_STAGING ON 1=1
...
);

Identifiers allow you to identify unique records in each table, enabling you to merge the deltas to the existing and loaded data model. You must set identifiers for all tables within your data model.

For data models created from process connectors, you may find that the identifier has already been set as part of the guided configuration. You can edit or update identifiers using the same steps below if needed.

To set identifiers from your data pool diagram:

  1. Click Data Models.

    find_the_data_model.png
  2. Select the data model you want to use for delta loads, opening the data model diagram.

  3. For each database table, click Set Identifiers.

    set_identifiers.png
  4. Select the identifier(s) you want to use for this table:

    select_identifier.png
  5. Click Save.

    The identifer(s) are now configured for this table, with the key icon indicating this in the data model diagram:

    key_icons.png

You can edit existing identifiers from your data model diagram by clicking Options - Edit:

edit_existing_identifier.png