Skip to content

Column preprocessors

Categorical

__init__

__init__(
    base: int = 1024,
    special_values: Sequence = (),
    impute_nan: bool = False,
    non_sample_values: Sequence = (),
    protection: Protection = Protection(),
) -> None

Categorical column preprocessor treating categories as ordinal values.

Parameters:

Name Type Description Default
base int

The base in which to represent the ordinal values associated to the categories.

1024
special_values Sequence

A sequence of values to be handled separately as categories.

()
impute_nan bool

Whether to impute NaN values. If True, NaN values are replaced with other plausible values.

False
non_sample_values Sequence

A sequence of values that should not be sampled.

()
protection Protection

A Protection object.

Protection()

Coordinates

__init__

__init__(
    base: int = 10,
    digits: int = 10,
    special_values: Sequence = (),
    impute_nan: bool = False,
    non_sample_values: Sequence = (),
    protection: Protection = Protection(),
) -> None

Coordinate column preprocessor.

Parameters:

Name Type Description Default
base int

Base in which to represent the coordinate values.

10
digits int

Number of digits to keep in the coordinate values.

10
special_values Sequence

A sequence of values to be handled separately as categories.

()
impute_nan bool

Whether to impute NaN values. If True, NaN values are replaced with other plausible values.

False
non_sample_values Sequence

A sequence of values that should not be sampled.

()
protection Protection

A Protection object.

Protection()

Date

__init__

__init__(
    fmt: str | None = None,
    special_values: Sequence = (),
    impute_nan: bool = False,
    non_sample_values: Sequence = (),
    protection: Protection = Protection(),
) -> None

Date column preprocessor respecting a weekly periodicity.

Parameters:

Name Type Description Default
fmt str | None

Datetime format. If None, it will be automatically inferred.

None
special_values Sequence

A sequence of values to be handled separately as categories.

()
impute_nan bool

Whether to impute NaN values. If True, NaN values are replaced with other plausible values.

False
non_sample_values Sequence

A sequence of values that should not be sampled.

()
protection Protection

A Protection object.

Protection()

Time

Time column preprocessor.

Datetime

__init__

__init__(
    date: Date | None = None,
    time: Time | None = None,
    fmt: str | None = None,
    special_values: Sequence = (),
    impute_nan: bool = False,
    non_sample_values: Sequence = (),
    protection: Protection = Protection(),
) -> None

Datetime column preprocessor.

Parameters:

Name Type Description Default
date Date | None

A Date preprocessor or None. If None, the default Date object is used.

None
time Time | None

A Time preprocessor or None. If None, the default Time object is used.

None
fmt str | None

Datetime format. If None, it will be automatically inferred.

None
special_values Sequence

A sequence of values to be handled separately as categories.

()
impute_nan bool

Whether to impute NaN values. If True, NaN values are replaced with other plausible values.

False
non_sample_values Sequence

A sequence of values that should not be sampled.

()
protection Protection

A Protection object.

Protection()

Integer

__init__

__init__(
    base: int = 10,
    special_values: Sequence = (),
    impute_nan: bool = False,
    non_sample_values: Sequence = (),
    protection: Protection = Protection(),
) -> None

Integer column preprocessor.

Parameters:

Name Type Description Default
base int

Base in which to represent the integer values.

10
special_values Sequence

A sequence of values to be handled separately as categories.

()
impute_nan bool

Whether to impute NaN values. If True, NaN values are replaced with other plausible values.

False
non_sample_values Sequence

A sequence of values that should not be sampled.

()
protection Protection

A Protection object.

Protection()

Numeric

__init__

__init__(
    base: int = 10,
    max_digits: int = 12,
    special_values: Sequence = (),
    impute_nan: bool = False,
    non_sample_values: Sequence = (),
    protection: Protection = Protection(),
) -> None

Numeric column preprocessor.

Parameters:

Name Type Description Default
base int

The base of the numeric system.

10
max_digits int

Number of digits to keep in the numeric values.

12
special_values Sequence

A sequence of values to be handled separately as categories.

()
impute_nan bool

Whether to impute NaN values. If True, NaN values are replaced with other plausible values.

False
non_sample_values Sequence

A sequence of values that should not be sampled.

()
protection Protection

A Protection object.

Protection()

ItaFiscalCode

__init__

__init__(
    special_values: Sequence = (),
    impute_nan: bool = False,
    non_sample_values: Sequence = (),
    protection: Protection = Protection(),
) -> None

Column preprocessor for Italian Fiscal Code.

Parameters:

Name Type Description Default
special_values Sequence

A sequence of values to be handled separately as categories.

()
impute_nan bool

Whether to impute NaN values. If True, NaN values are replaced with other plausible values.

False
non_sample_values Sequence

A sequence of values that should not be sampled.

()
protection Protection

A Protection object.

Protection()

Text

Text column preprocessor.