Skip to main content

Celonis Product Documentation

Table statistics

This can help you to shorten data job run times (transformations), reduce infrastructure workload and solve out-of-memory issues.

A lot of queries can be executed in many different ways. For example, an order of inner joining tables might be different without an impact on the end result. However, as tables differ in size, order of inner joining them might have an impact on query run times, consumption of CPU and memory. To find the optimal way of executing a query, databases estimate costs of all operations and chose the execution plan which has the lowest cost. To estimate the cost, databases relay on table statistics, i.e. a metadata with basic characteristics of a table (e.g. count of rows). The more accurate and up-to-date statistics, the better cost estimation and the more optimal and faster query executions. Analyzing tables for statistics is usually a matter of seconds even for the biggest tables.

When should you analyze tables for the statistics

How to check if the table has statistics

Before running a SELECT query, add EXPLAIN at the very beginning (EXPLAIN SELECT). This will produce a query plan. Should there be no statistics available, this information will be included in the query plan, next to the table name. In the example below, table BKPF has no statistics:

Access Path:
+-STORAGE ACCESS for bkpf [Cost: 798, Rows: 7K (NO STATISTICS)] (PATH ID: 1)

How get or update the table statistics

In Vertica, you should run query: SELECT ANALYZE_STATISTICS ('table_name');

For example:

SELECT ANALYZE_STATISTICS ('BKPF');

How to integrate the statistics query in process connectors

  • Intermediary tables created in transformations (primary use case)

If you create a new table and populate it with data in the transformations, simply add SELECT ANALYZE_STATISTICS ('table_name'); statement to the task, after the queries creating the tables and inserting the data. It doesn't make sense to analyze tables which are not used anywhere else in the transformations.

  • Extracted tables

You might analyze tables after extracting them - if the extracted tables do not have statistics already (check it using the EXPLAIN command). As of June 11th 2021 all newly extracted tables should have statistics by default. Shall the statistics no be available, you might want to create a new task, the first one in the transformations with multiple statistics queries, one for each table.

50727273.png

Content of the Analyze extracted tables task:

select analyze_statistics ('bkpf');

select analyze_statistics ('bsid');

select analyze_statistics ('bsad');

select analyze_statistics ('cdhdr');

select analyze_statistics ('cdpos');

select analyze_statistics ('dd02t');

select analyze_statistics ('dd03m');

select analyze_statistics ('kna1');

select analyze_statistics ('knb1');

select analyze_statistics ('knb5');

select analyze_statistics ('likp');

  • Tables modified in transformations

If you change a table (add columns, rows etc.), you should include the statistics query after the query that performs the change operation.

Related topics