Preprocessors
TabularPreproc
from_schema
classmethod
from_schema(
schema: Schema,
ctx_cols: dict[str, Sequence[str]] | None = None,
preprocessors: dict[
str, dict[str, ColumnPreproc | ArColumn | None]
]
| None = None,
) -> TabularPreproc
Build a preprocessor for tabular data from the Schema
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
Schema
|
A |
required |
ctx_cols
|
dict[str, Sequence[str]] | None
|
A dictionary with the columns to be used as context. May contain: 1. Only the root table as key and a subset of its columns as value. 2. All the tables as keys and a subset of each table's columns as values. |
None
|
preprocessors
|
dict[str, dict[str, ColumnPreproc | ArColumn | None]] | None
|
A dictionary containing preprocessing instructions for each column in the schema.
Keys are table names, values are dictionaries with column names as keys and preprocessing
instructions as values. Preprocessing instructions can be instances of |
None
|
Returns:
Type | Description |
---|---|
TabularPreproc
|
A |
fit
fit(data: RelationalData) -> TabularPreproc
Fit the preprocessor to the given RelationalData
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
RelationalData
|
The |
required |
Returns:
Type | Description |
---|---|
TabularPreproc
|
The fitted |
select_ctx
select_ctx(
data: RelationalData, idx: Sequence[int] | None = None
) -> RelationalData
Select the context from the input data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
RelationalData
|
The |
required |
idx
|
Sequence[int] | None
|
The indices of the root table to select. May be repeated. If None, all indices will be taken once and the keys will be kept. Otherwise, the keys will be reset. |
None
|
Returns:
Type | Description |
---|---|
RelationalData
|
A |
sample_ctx
sample_ctx(
data: RelationalData,
n_samples: int | None = None,
rng: Generator | int | None = None,
) -> RelationalData
Sample the context from th einput data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
RelationalData
|
The |
required |
n_samples
|
int | None
|
The number of context samples. If None, the number of samples will be equal to the number of samples in the root table. |
None
|
rng
|
Generator | int | None
|
A |
None
|
Returns:
Type | Description |
---|---|
RelationalData
|
A |
TextPreproc
from_schema_table
classmethod
from_schema_table(
schema: Schema, table: str
) -> TextPreproc
Build a preprocessor for the text columns of a table from the Schema
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
Schema
|
A |
required |
table
|
str
|
Name of the target table in the schema that contains text columns. |
required |
Returns:
Type | Description |
---|---|
TextPreproc
|
A |
from_tabular
classmethod
from_tabular(
preproc: TabularPreproc[_AP], table: str
) -> TextPreproc
Build a preprocessor for the text columns of a table from the TabularPreproc
used for the tabular data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preproc
|
TabularPreproc[_AP]
|
A |
required |
table
|
str
|
Name of the target table in the schema that contains text columns. |
required |
Returns:
Type | Description |
---|---|
TextPreproc
|
A |
fit
fit(data: RelationalData) -> TextPreproc
Fit the preprocessor to the given RelationalData
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
RelationalData
|
The |
required |
Returns:
Type | Description |
---|---|
TextPreproc
|
The fitted |
ColumnPreproc
dataclass
Preprocessing instructions for a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
special_values
|
Sequence | None
|
A sequence of special values to handle during preprocessing. |
None
|
impute_nan
|
bool | None
|
A flag indicating whether to impute NaN values during preprocessing. If True, NaN values will not be sampled during the generation of synthetic data. |
None
|
non_sample_values
|
Sequence | None
|
A sequence of values that should not be sampled during the generation of synthetic data. |
None
|
protection
|
Protection | bool | None
|
A |
None
|