Preprocessors
TabularPreproc
from_schema
classmethod
from_schema(
schema: Schema,
ctx_cols: dict[str, Sequence[str]] | None = None,
preprocessors: dict[
str, dict[str, ColumnPreproc | ArColumn | None]
]
| None = None,
) -> TabularPreproc
Build a preprocessor for tabular data from the Schema
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
Schema
|
A |
required |
ctx_cols
|
dict[str, Sequence[str]] | None
|
A dictionary with the columns to be used as context. May contain: 1. Only the root table as key and a subset of its columns as value. 2. All the tables as keys and a subset of each table's columns as values. |
None
|
preprocessors
|
dict[str, dict[str, ColumnPreproc | ArColumn | None]] | None
|
A dictionary containing preprocessing instructions for each column in the schema.
Keys are table names, values are dictionaries with column names as keys and preprocessing
instructions as values. Preprocessing instructions can be instances of |
None
|
Returns:
Type | Description |
---|---|
TabularPreproc
|
A |
fit
fit(data: RelationalData) -> TabularPreproc
Fit the preprocessor to the given RelationalData
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
RelationalData
|
The |
required |
Returns:
Type | Description |
---|---|
TabularPreproc
|
The fitted |
TextPreproc
from_schema_table
classmethod
from_schema_table(
schema: Schema, table: str
) -> TextPreproc
Build a preprocessor for the text columns of a table from the Schema
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
Schema
|
A |
required |
table
|
str
|
Name of the target table in the schema that contains text columns. |
required |
Returns:
Type | Description |
---|---|
TextPreproc
|
A |
from_tabular
classmethod
from_tabular(
preproc: TabularPreproc[_AP], table: str
) -> TextPreproc
Build a preprocessor for the text columns of a table from the TabularPreproc
used for the tabular data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preproc
|
TabularPreproc[_AP]
|
A |
required |
table
|
str
|
Name of the target table in the schema that contains text columns. |
required |
Returns:
Type | Description |
---|---|
TextPreproc
|
A |
fit
fit(data: RelationalData) -> TextPreproc
Fit the preprocessor to the given RelationalData
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
RelationalData
|
The |
required |
Returns:
Type | Description |
---|---|
TextPreproc
|
The fitted |
ColumnPreproc
dataclass
Preprocessing instructions for a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
special_values
|
Sequence | None
|
A sequence of special values to handle during preprocessing. |
None
|
impute_nan
|
bool | None
|
A flag indicating whether to impute NaN values during preprocessing. If True, NaN values will not be sampled during the generation of synthetic data. |
None
|
non_sample_values
|
Sequence | None
|
A sequence of values that should not be sampled during the generation of synthetic data. |
None
|
protection
|
Protection | bool | None
|
A |
None
|