Custom configuration in the Python tool
The Python tool allows your Process Copilot to do ad-hoc calculations in addition to what is available in the referenced Knowledge Model. With this tool, Process Copilots can execute a custom Python script using input arguments provided by the Large Language Model (LLM).
Example
The following example shows the YAML configuration of a Python tool that computes the amount of unique elements in a column material names (MR.MATERIAL_NAME) and multiplies the result by a number n, that is chosen by the LLM.
- id: python unique_id: count_material_name description: Multiply count of distinct material names input_schema: properties: n: <- Becomes an argument for the LLM description: Multiplier for the count of distinct material names type: integer columns: - MR.MATERIAL_NAME <- Will be a column in 'df' - MR.AMOUNT_IN_STOCK <- Will be a column in 'df' adhoc_filtering: false code: distinct_material_names_count = df['MR.MATERIAL_NAME'].nunique() distinct_material_names_count * n
Set up a custom configuration in the Python tool
You need to provide the following unique arguments to the tool for the configuration:
input_schema
The argument schema that the LLM will be able to provide to the Python code.
columns
The KPI and/or Record Attributes that should be available to the Python code via a Pandas dataframe as the
df
variable.code
The Python code to execute when the Python tool is called.
Input schema
In the input schema you define which arguments the LLM will need to provide to the Python code. Each input that is defined can be used in the Python code as a regular variable. You should make sure to give the inputs meaningful names and descriptions so that the LLM understands how those inputs can be used.
Supported input variable types
The currently supported input variable types for the Python code are the following which are mapped to the respective Python equivalent:
Input Schema Type | Corresponding Python Type |
---|---|
Array | list |
Boolean | bool |
Integer | int |
Number | float |
String | str |
Columns
To utilize data within your Python code, you can specify the IDs of the desired KPIs and Record Attributes in the columns
section. This will provide you with access to a Pandas DataFrame
containing the data you selected. The dataframe will be accessible via the df
variable and available for use within your code.
Ad-hoc filtering
Optionally, the adhoc_filtering
flag can be set to true
. This allows the LLM to apply ad-hoc filters to the data loaded into df
. For example, using the previously defined Python tool, you could instruct the LLM to execute the tool while only considering material names with a specific amount in stock.
Supported ad-hoc filter types
The following filter types are currently available and generated automatically in Process Copilots by the LLM:
Filter Type | Description | Examples |
---|---|---|
StringFilter | Filters a string column by an exact string value, a wildcard string, and case (in)sensitive. | MR.MATERIAL_NAME = ‘Steel’ MR.MATERIAL_NAME LIKE ‘%Steel%’ |
DateFilter | Filters a date column by a date range. | MR.CREATION_DATE BETWEEN ‘2021-01-01’ AND ‘2021-03-31’ |
NullFilter | Filters a column by a null check. | MR.MATERIAL_NAME IS NULL MR.MATERIAL_NAME IS NOT NULL |
NumericFilter | Filters a numeric column by '=', '!=', '>', '>=', '<', '<=' operators. | MR.VALUE > 10.12 MR.VALUE <= 5.11 |
Limitations
The dataframe will be populated with pre-filtered data, which has two benefits:
The data loading with pre-filtering is much faster than filtering in Python code.
The Python tool can only load a maximum of 500,000 rows into
df
. If the subset of the data you want to access is less than 500,000 rows, but the total amount is more than 500,000, then some of the data you want to access might get cut off. Ad-hoc filtering allows you to only load the rows you need into the dataframe.
In the background the LLM will pass filter arguments to the Python tool, which uses them when loading the dataframe.
Code
Your Python code can utilize both the inputs defined in the input_schema
and the Pandas dataframe df
. It is important to note that the final line of your code must evaluate to a result, such as be an expression. Otherwise, the tool will not return any output. The available libraries within the code environment are restricted to NumPy, Pandas, and standard Python libraries. The code assistant, accessible in the configuration screen, manages these code related requirements and can be used to implement or draft Python functionalities.
Additional Examples
- id: python code: | from datetime import datetime date_object = datetime.strptime(date, '%Y-%m-%d') weekday = date_object.strftime('%A') weekday unique_id: get_weekday description: Given a date, return which weekday it is input_schema: properties: date: type: string description: The date in 'YYYY-MM-DD' format
Explanation: The LLM can call the Python tool, provide a date, and get the corresponding weekday in return.
- id: python unique_id: count_material_name description: Get the count of distinct material names columns: - MR.MATERIAL_NAME - MR.AMOUNT_IN_STOCK adhoc_filtering: true code: | distinct_material_names_count = df[MR.MATERIAL_NAME'].nunique() distinct_material_names_count
Explanation: This is the initial example where adhoc_filtering
is set to true
and the multiplication by n
is removed. When the LLM is asked to count the number of distinct material names containing a specific material (e.g., "wood") with at least five units in stock, the LLM generates a filter on the fly while calling count_material_name
and as a result, the pre-filtered data is loaded into df
.