Skip to content

Column preprocessors

Protection pydantic-config

Protection(
    detectors: list[Detector] = [], default: bool = False
)

A configuration for rare value protection of a single column.

Parameters:

Name Type Description Default
detectors list[Detector]

List of detectors to use to determine the data to be protected.

[]
default bool

Whether to include the default detectors.

False

Categorical

Categorical(
    name: str | None = None,
    description: str | None = None,
    protection: Protection | bool = Protection(),
)

A preprocessor for categorical columns.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()

Boolean

Boolean(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
)

A preprocessor for boolean columns.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()

DateTime

DateTime(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    fmt: str | None = None,
)

A preprocessor for columns containing date-time data.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
fmt str | None

The data-time format of the data. If None, it will be inferred from the data.

None

Date

Date(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    fmt: str | None = None,
)

A preprocessor for columns containing dates.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
fmt str | None

The data-time format of the data. If None, it will be inferred from the data.

None

Time

Time(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    fmt: str | None = None,
)

A preprocessor for columns containing time data.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
fmt str | None

The data-time format of the data. If None, it will be inferred from the data.

None

LowerBound

LowerBound(
    bound: float | None = None, strict: bool = False
)

A class describing a lower bound.

Parameters:

Name Type Description Default
bound float | None

The numeric value of the bound. If None, it will be inferred from the data.

None
strict bool

Whether the bound is strict or not.

False

UpperBound

UpperBound(
    bound: float | None = None, strict: bool = False
)

A class describing an upper bound.

Parameters:

Name Type Description Default
bound float | None

The numeric value of the bound. If None, it will be inferred from the data.

None
strict bool

Whether the bound is strict or not.

False

NumCfg pydantic-config

NumCfg(
    lower_bound: LowerBound | float | None = None,
    upper_bound: UpperBound | float | None = None,
    multiple_of: float | None = None,
)

A configuration for the generation of a numerical variable.

Parameters:

Name Type Description Default
lower_bound LowerBound | float | None

The lower bound. If a numeric value, it is used to build a LowerBound object. If None, no lower bound is enforced.

None
upper_bound UpperBound | float | None

The upper bound. If a numeric value, it is used to build a UpperBound object. If None, no upper bound is enforced.

None
multiple_of float | None

The common denominator of the numerical values, if any.

None

Numeric

Numeric(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    cfg: NumCfg | None = None,
)

A preprocessor for columns containing numeric data.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
cfg NumCfg | None

The configuration for the numeric variable. By default, the lower and upper bounds are inferred from the data and enforced in generation.

None

Integer

Integer(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    cfg: NumCfg | None = None,
)

A preprocessor for columns containing integer data.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
cfg NumCfg | None

The configuration for the numeric variable. By default, the lower and upper bounds are inferred from the data and enforced in generation.

None

Coordinates

Coordinates(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    cfg_lat: NumCfg[float] | None = None,
    cfg_lon: NumCfg[float] | None = None,
)

A preprocessor for columns containing geographic coordinates.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
cfg_lat NumCfg[float] | None

The configuration for the numeric value of the latitude.

None
cfg_lon NumCfg[float] | None

The configuration for the numeric value of the longitude.

None

MinLength

MinLength(bound: int | None = None)

A class describing a lower bound on text length.

Parameters:

Name Type Description Default
bound int | None

The value of the bound. If None, it will be inferred from the data.

None

MaxLength

MaxLength(bound: int | None = None)

A class describing an upper bound on text length.

Parameters:

Name Type Description Default
bound int | None

The value of the bound. If None, it will be inferred from the data.

None

TextCfg pydantic-config

TextCfg(
    min_length: MinLength | int | None = None,
    max_length: MaxLength | int | None = None,
    pattern: str | None = None,
)

A configuration for the generation of free text.

Parameters:

Name Type Description Default
min_length MinLength | int | None

The minimum length. If an integer, it is used to build a MinLength object. If None, no minimum length is enforced.

None
max_length MaxLength | int | None

The maximum length. If an integer, it is used to build a MaxLength object. If None, no maximum length is enforced.

None
pattern str | None

Am optional regex for the text to conform to.

None

Text

Text(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    cfg: TextCfg | None = None,
)

A preprocessor for columns containing free text.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
cfg TextCfg | None

The configuration for the text. By default, the minimum and maximum lengths are inferred from the data and enforced in generation.

None