Skip to main content

Celonis Product Documentation

ZSCORE
Description

This function calculates the z-score over an INT or a FLOAT. The output type is always FLOAT.

ZSCORE can act as a standardization of data by mapping each value to the distance to the mean in multiples of standard deviations. This is especially useful in evaluating simple 2-, 3- or 6-sigma rules for outlier detection on a column.

Syntax
  ZSCORE ( table.column [, PARTITION BY ( partition_column, ... )] )
 
  • column: The source column where rows are taken from.

  • partition_column: Optional partition column to specify groups in which ZSCORE should operate.

Tips
Constants

If the input column contains only a single value, ZSCORE returns NULL

NULL handling

If the input column contains NULL values, they are ignored and do not affect the calculation of the mean or standard deviation. The output for a NULL input value is NULL as well.

Partitioning

The partition columns specify groups. The ZSCORE function operates independently within every group.

Examples

[1]

Simple ZSCORE calculation over FLOAT column:

Query

Column1

         ZSCORE ( "TABLE"."COLUMN" )
        

Input

Output

TABLE

COLUMN : float

1.0

1.0

7.0

7.0

4.0

Result

Column1 : float

-1.0

-1.0

1.0

1.0

0.0

[2]

ZSCORE in combination with a DATE column using HOURS as projection function:

Query

Column1

         ZSCORE ( HOURS ( "TABLE"."COLUMN" ) )
        

Input

Output

TABLE

COLUMN : date

Sat Jan 01 2000 06:00:00.000

Sat Jan 01 2000 07:00:00.000

Sat Jan 01 2000 08:00:00.000

Sat Jan 01 2000 09:00:00.000

Sat Jan 01 2000 10:00:00.000

Sat Jan 01 2000 11:00:00.000

Sat Jan 01 2000 12:00:00.000

Sat Jan 01 2000 13:00:00.000

Sat Jan 01 2000 14:00:00.000

Sat Jan 01 2000 18:00:00.000

Sat Jan 01 2000 08:00:00.000

Sat Jan 01 2000 09:00:00.000

Sat Jan 01 2000 08:00:00.000

Result

Column1 : float

-1.274141

-0.97298

-0.67182

-0.370659

-0.069499

0.231662

0.532823

0.833983

1.135144

2.339786

-0.67182

-0.370659

-0.67182

[3]

In this example, the input column contains NULL values. NULL values are ignored and do not affect the calculation of the z-score. The output for a NULL input value is NULL as well:

Query

Column1

         ZSCORE ( "TABLE"."COLUMN" )
        

Input

Output

TABLE

COLUMN : int

1

null

null

7

4

Result

Column1 : float

-1.0

null

null

1.0

0.0

[4]

ZSCORE with a single partition column:

Query

Column1

         ZSCORE ( "Table1"."column" , PARTITION BY ( "Table1"."Country" ) )
        

Input

Output

Table1

column : int

Country : string

1

'Germany'

1

'Germany'

7

'Germany'

7

'USA'

4

'USA'

Result

Column1 : float

-0.5773503

-0.5773503

1.1547005

0.7071068

-0.7071068

[5]

ZSCORE with multiple partition columns:

Query

Column1

         ZSCORE ( "Table1"."column" , PARTITION BY ( "Table1"."Country" , "Table1"."State" ) )
        

Input

Output

Table1

column : int

Country : string

State : string

1

'Germany'

'Bavaria'

2

'Germany'

'Berlin'

2

'Germany'

'Bavaria'

3

'Germany'

'Bavaria'

1

'USA'

'California'

Result

Column1 : float

-1.0

null

0.0

1.0

null

[6]

ZSCORE with multiple tables:

Query

Column1

         ZSCORE ( "Table1"."column" , PARTITION BY ( "Table1"."country" , "Table2"."state" ) )
        

Input

Output

Table1

id : int

column : int

country : string

1

1

'Germany'

2

2

'Germany'

3

2

'Germany'

4

3

'Germany'

5

1

'USA'

Table2

id : int

state : string

1

'Bavaria'

2

'Berlin'

3

'Bavaria'

4

'Bavaria'

5

'California'

Foreign Keys

Table2.id

Table1.id

Result

Column1 : float

-1.0

null

0.0

1.0

null

[7]

For a single input value, NULL is returned:

Query

Column1

         ZSCORE ( "TABLE"."COLUMN" )
        

Input

Output

TABLE

COLUMN : int

1

Result

Column1

null

See also: