Skip to main content

Celonis Product Documentation

SQL best practice checklist

When managing your SQL, use the following best practice checklist:

Category

Rule

Comment

CREATE TABLE/VIEW

Use Views if they are not/seldom accessed in the transformations

The use of views should be limited to DM tables only. For large data model tables with complex definitions (e.g. multiple joins) a table may be created for DM instead, if DM load runtimes are longer than expected.

Use Tables if many transformations access it

Temporary tables (especially large tables) should not be created as views when used by subsequent transformations.

Combine tables if possible

Avoid extra tables that carry the same data, e.g. case table and TMP table

Order a table that is often joined by the columns used in this JOIN and segment by one

This will sort the default projection by those columns and will improve performance of subsequent queries that use the table

Ensure Table Statistics are refreshed for optimal performance of queries.

Run SELECT ANALYZE_STATISTICS(‘TABLENAME’); after the creation of every temporary/custom table.

Field sizes are significant from a performance aspect. Review your tableschemas and reduce the size of the fields if it is not used.

Example: A reduction from VARCHAR(200) to VARCHAR(20) might have a significantperformance impact.

SELECT

Avoid DISTINCT

Consider whether a better designed query logic makes the use of DISTINCT unnecessary.

Reduce SELECT * to needed columns

The fewer columns carried over (in temp tables or DM tables), the more performant the pipeline will be.

Avoid (nearly) unused JOINS

e.g. if only used for one column that 'might' be used in an analysis

JOINS

Tables should be INNER JOINED in the order of ascending size

The sorted hash table needed for the INNER join will thus be created based on the smaller table. If Table Statistics have been properly refreshed, attention to the join order is not required.

JOINS fields > 500 characters are significantly slower than JOINS on fields < 500 characters.

Make sure to optimize the field size, example: VARCHAR(60) offers better performance than using VARCHAR(600).

Avoid INNER JOIN with the purpose of limiting the data set

using WHERE EXISTS in such cases may help to avoid duplications of records in the resulting data set that a JOIN operation would cause, forcing the user to resort to a use of DISTINCT, negatively affecting performance

Don’t modify columns with functions in the join

e.g. avoid the use of functions such as substring(), right() or cast() in JOIN conditions

UNION

Move conditions to extraction if possible

If same conditions are applied on a table throughout the pipeline, limiting the table contents during extraction will improve runtimes throughout the pipeline, as well as provide APC savings

Use UNION ALL if possible

Using UNION ALL instead of UNION avoids DISTINCT operations in the background

CONDITIONS

Move WHERE-conditions directly into inner joins

Any WHERE conditions that are applied on tables that are part of INNER JOINS of a query can be incorporated in the JOIN conditions instead.

BETWEEN is better than AND

DATE≥1970 AND DATE≤1980 is slower than DATE BETWEEN 1970 AND 1980