Preprocessors

TabularPreproc

from_schema `classmethod`

from_schema(
    schema: Schema,
    ctx_cols: dict[str, Sequence[str]] | None = None,
    preprocessors: dict[
        str, dict[str, ColumnPreproc | ArColumn | None]
    ]
    | None = None,
) -> TabularPreproc

Build a preprocessor for tabular data from the Schema.

Parameters:

Name	Type	Description	Default
`schema`	`Schema`	A `Schema` object.	required
`ctx_cols`	`dict[str, Sequence[str]] \| None`	A dictionary with the columns to be used as context. May contain: 1. Only the root table as key and a subset of its columns as value. 2. All the tables as keys and a subset of each table's columns as values.	`None`
`preprocessors`	`dict[str, dict[str, ColumnPreproc \| ArColumn \| None]] \| None`	A dictionary containing preprocessing instructions for each column in the schema. Keys are table names, values are dictionaries with column names as keys and preprocessing instructions as values. Preprocessing instructions can be instances of `ColumnPreproc`, a column preprocessor, or None. If None, the column will be ignored. For the columns for which a preprocessor is not provided, the default preprocessor will be instantiated based on the `Column` type defined in the `Schema`.	`None`

Returns:

Type	Description
`TabularPreproc`	A `TabularPreproc` object.

fit

fit(data: RelationalData) -> TabularPreproc

Fit the preprocessor to the given RelationalData.

Parameters:

Name	Type	Description	Default
`data`	`RelationalData`	The `RelationalData` to fit the preprocessor to.	required

Returns:

Type	Description
`TabularPreproc`	The fitted `TabularPreproc` object.

select_ctx

select_ctx(
    data: RelationalData, idx: Sequence[int] | None = None
) -> RelationalData

Select the context from the input data.

Parameters:

Name	Type	Description	Default
`data`	`RelationalData`	The `RelationalData` from which to extract the context.	required
`idx`	`Sequence[int] \| None`	The indices of the root table to select. May be repeated. If None, all indices will be taken once and the keys will be kept. Otherwise, the keys will be reset.	`None`

Returns:

Type	Description
`RelationalData`	A `RelationalData` with the selected context.

sample_ctx

sample_ctx(
    data: RelationalData,
    n_samples: int | None = None,
    rng: Generator | int | None = None,
) -> RelationalData

Sample the context from th einput data.

Parameters:

Name	Type	Description	Default
`data`	`RelationalData`	The `RelationalData` from which to extract the context.	required
`n_samples`	`int \| None`	The number of context samples. If None, the number of samples will be equal to the number of samples in the root table.	`None`
`rng`	`Generator \| int \| None`	A `np.random.Generator` or an integer seed to control the randomness during sampling. If None, a random seed is generated.	`None`

Returns:

Type	Description
`RelationalData`	A `RelationalData` with the sampled context.

TextPreproc

from_schema_table `classmethod`

from_schema_table(
    schema: Schema, table: str
) -> TextPreproc

Build a preprocessor for the text columns of a table from the Schema.

Parameters:

Name	Type	Description	Default
`schema`	`Schema`	A `Schema` object.	required
`table`	`str`	Name of the target table in the schema that contains text columns.	required

Returns:

Type	Description
`TextPreproc`	A `TextPreproc` object.

from_tabular `classmethod`

from_tabular(
    preproc: TabularPreproc[_AP], table: str
) -> TextPreproc

Build a preprocessor for the text columns of a table from the TabularPreproc used for the tabular data.

Parameters:

Name	Type	Description	Default
`preproc`	`TabularPreproc[_AP]`	A `TabularPreproc` object used for the tabular part of the data.	required
`table`	`str`	Name of the target table in the schema that contains text columns.	required

Returns:

Type	Description
`TextPreproc`	A `TextPreproc` object.

fit

fit(data: RelationalData) -> TextPreproc

Fit the preprocessor to the given RelationalData.

Parameters:

Name	Type	Description	Default
`data`	`RelationalData`	The `RelationalData` to fit the preprocessor to.	required

Returns:

Type	Description
`TextPreproc`	The fitted `TextPreproc` object.

ColumnPreproc `dataclass`

Preprocessing instructions for a column.

Parameters:

Name	Type	Description	Default
`special_values`	`Sequence \| None`	A sequence of special values to handle during preprocessing.	`None`
`impute_nan`	`bool \| None`	A flag indicating whether to impute NaN values during preprocessing. If True, NaN values will not be sampled during the generation of synthetic data.	`None`
`non_sample_values`	`Sequence \| None`	A sequence of values that should not be sampled during the generation of synthetic data.	`None`
`protection`	`Protection \| bool \| None`	A `Protection` object or boolean flag indicating whether to apply protection to the column. If boolean, the default protection is applied, otherwise the `Protection` object configures the protection.	`None`

Preprocessors

TabularPreproc

from_schema classmethod

fit

select_ctx

sample_ctx

TextPreproc

from_schema_table classmethod

from_tabular classmethod

fit

ColumnPreproc dataclass

from_schema `classmethod`

from_schema_table `classmethod`

from_tabular `classmethod`

ColumnPreproc `dataclass`