Replication Cockpit - Architecture and pipeline design

When building a real-time data pipeline from SAP, the Replication Cockpit and traditional Data Jobs must co-exist. Designing your pipeline requires two main decisions: deciding which tables should be replicated in real-time, and choosing the right Data Pool architecture to support your transformation needs.

Replications vs. Data Jobs

While the Replication Cockpit (RC) handles real-time continuous streaming, traditional Data Jobs (DJ) are still used for orchestration and specific extractions.

Here is the recommended approach for dividing your tables:

Transactional Tables: Tables that include a Case or Activity (e.g., EKKO, EKPO, CDPOS, CDHDR, BKPF) are highly dynamic. These should be extracted via the Replication Cockpit using real-time streaming.
Metadata Tables: Tables that contain relatively static or infrequently updated information can be extracted using either the Replication Cockpit or traditional Data Jobs.

Managing Full Loads (Initializations)

To begin replicating data, you must first perform a full extraction.

Best Practice: We highly recommend using the Initialization functionality within the Replication Cockpit to execute full loads rather than doing it via Data Jobs.
If you must use Data Jobs for Full Loads: You must deactivate the corresponding table in the Replication Cockpit while the Data Job runs. Because there can only be one data push job per table at a time, running both concurrently will lead to data loss. You can manage concurrent extraction limits using the Replication Calendar.

Choosing a Data Pool Architecture

The Golden Rule of the Replication Cockpit: Because real-time extractions are based on SAP change logs, each table can only be extracted once. If you attempt to extract the same SAP table via the Replication Cockpit into multiple different Data Pools, the replications will conflict and data will be lost.

To safely centralize your extractions, you must choose between two Data Pool architectures based on your need for Real-Time Transformations.

Option A: The Single Data Pool (Required for Real-Time Transformations)

In this setup, all extractions, transformations, and Data Models reside in one unified Data Pool.

How it works: Because real-time transformations in the Replication Cockpit are directly coupled to extractions, they must exist in the exact same Data Pool.
The Impact: This is the only architecture that fully supports Real-Time Transformations. The trade-off is that you cannot separate processes or use-cases at the Data Pool level; instead, you must rely on separate Data Jobs and Data Models within the single pool to organize your processes.

Option B: The Global Extraction Data Pool (For Separated Use Cases)

In this setup, extractions and transformations are decoupled to maintain strict boundaries between different business processes.

How it works: You create one central "Global" Data Pool dedicated solely to extracting raw data via the Replication Cockpit. You then create separate, process-specific Data Pools (e.g., one for P2P, one for O2C) that import the raw data from the Global pool via database views.
The Impact: This is ideal if you need strict permission management separated by use-case. However, Real-Time Transformations are not possible in this architecture because the extractions and transformations do not reside in the same pool.