Skip to main content

Celonis Product Documentation

Data format support

Apache Kafka is data payload agnostic. The message information is an array of bytes. The actual bytes can be the storage for

* AVRO

* PROTOBUF

* XML

* JSON

* JSON with Schema

Below is a table with the supported formats and schema evolution. For AVRO and PROTOBUF, it is expected a schema registry solution is in place.

Data Format

Supported

Schema Evolution

AVRO

Yes

Only adding nullable columns

PROTOBUF

Yes

Only adding nullable columns

JSON

Partially

If each message contains all the fields and no field is nullable.

JSON with schema

Yes

Only adding nullable columns

XML

No

-

JSON data format needs a bit of attention. While it is a simple and human-readable storage format, it's far from the ideal format for enforcing data quality and schema validation.

Each Kafka message is self-contained. This means the information it carries it's not dependent to previous messages. As a result, the connector can infer the schema only at the message level. But then a JSON document can contain a field address=null, or maybe the nullable field is not even written in the payload - for performance reasons. Therefore there is no way to infer the type correctly, and tracking cross messages is not a bulletproof solution.

{
   "firstName": "Alexandra",
   "lastName": "Jones",
   "address": null
}
{
   "firstName": "Alexandra",
   "lastName": "Jones",
   "address": null,
   "age": 32
}

In some use cases, the JSON payload can contain the schema, and for this, as stated in the table above, the support is better. Here is an example of a JSON with a schema that the Kafka Connect converter: JsonConverter can interpret.

{
  "schema": {
    "type": "struct",
    "fields": [
      {
        "type": "int64",
        "optional": false,
        "field": "registertime"
      },
      {
        "type": "string",
        "optional": false,
        "field": "userid"
      },
      {
        "type": "string",
        "optional": false,
        "field": "regionid"
      },
      {
        "type": "string",
        "optional": false,
        "field": "gender"
      }
    ],
    "optional": false,
    "name": "ksql.users"
  },
  "payload": {
    "registertime": 1493819497170,
    "userid": "User_1",
    "regionid": "Region_5",
    "gender": "MALE"
  }
}