Skip to content

Preprocessor

EventPreproc

from_schema classmethod

from_schema(
    schema: Schema,
    ord_cols: dict[str, str] | None = None,
    final_tables: Iterable[str] = (),
    ctx_cols: Sequence[str] = (),
    preprocessors: dict[
        str, dict[str, ColumnPreproc | ArColumn | None]
    ]
    | None = None,
    **kwargs: Any,
) -> EventPreproc

Build a preprocessor for tabular event data from the Schema.

Parameters:

Name Type Description Default
schema Schema

A Schema object.

required
ord_cols dict[str, str] | None

For each table, the column that contain the order of the event. If None, only a single event table which is not a final table is possible, and the events are considered order as for the order in the event table. If a dictionary, each event table which is not a final table must contain an order column. Final tables may or may not have an order column.

None
final_tables Iterable[str]

The tables that contain the final events.

()
ctx_cols Sequence[str]

A sequence with the root columns to be used as context.

()
preprocessors dict[str, dict[str, ColumnPreproc | ArColumn | None]] | None

A dictionary containing preprocessing instructions for each column in the schema. Keys are table names, values are dictionaries with column names as keys and preprocessing instructions as values. Preprocessing instructions can be instances of ColumnPreproc, a column preprocessor, or None. If None, the column will be ignored. For the columns for which a preprocessor is not provided, the default preprocessor will be instantiated based on the Column type defined in the Schema. Custom preprocessing in not available for the order columns provided in ord_cols.

None
kwargs Any

Optional additional keyword arguments: 1. ord_diff (bool, default=True): If True, replace the ordinal values with their discrete differences. 2. ord_unit (str, default=s): The time unit for converting the ordinal columns to numeric values. Available values are years (Y), months (M), days (D), hours (h), minutes (m), seconds (s), milliseconds (ms), microseconds (us) and nanoseconds (ns). Only available when the ordinal columns are datetime. 3. cat_units (Collection[str], default=()): Date and time units that are used as categorical variables. If provided, ord_unit must be explicitly specified, and it must be larger than all date_units. Available values are the same as for ord_unit, except for the sub-second ones (ms, us and ns). Only available when the ordinal columns are datetime.

{}

Returns:

Type Description
EventPreproc

An EventPreproc object.

fit

Fit the preprocessor to the given RelationalData.

Parameters:

Name Type Description Default
data RelationalData

The RelationalData to fit the preprocessor to.

required

Returns:

Type Description
EventPreproc

The fitted EventPreproc object.