Skip to content

Column preprocessors

Protection pydantic-config

Protection(
    detectors: list[Detector] = [], default: bool = False
)

A configuration for rare value protection of a single column.

Parameters:

Name Type Description Default
detectors list[Detector]

List of detectors to use to determine the data to be protected.

[]
default bool

Whether to include the default detectors.

False

Categorical

Categorical(
    name: str | None = None,
    description: str | None = None,
    protection: Protection | bool = Protection(),
)

A preprocessor for categorical columns.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()

Boolean

Boolean(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
)

A preprocessor for boolean columns.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()

DateTime

DateTime(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    fmt: str | None = None,
)

A preprocessor for columns containing date-time data.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
fmt str | None

The data-time format of the data. If None, it will be inferred from the data.

None

Date

Date(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    fmt: str | None = None,
)

A preprocessor for columns containing dates.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
fmt str | None

The data-time format of the data. If None, it will be inferred from the data.

None

Time

Time(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    fmt: str | None = None,
)

A preprocessor for columns containing time data.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
fmt str | None

The data-time format of the data. If None, it will be inferred from the data.

None

LowerBound

LowerBound(
    bound: float | None = None,
    strict: bool = False,
    eps: float = 0,
    rng: NpRng = None,
)

A class describing a lower bound.

Parameters:

Name Type Description Default
bound float | None

The numeric value of the bound. If None, it will be inferred from the data.

None
strict bool

Whether the bound is strict or not.

False
eps float

The amount of noise to add when the bound is inferred from the data.

0
rng NpRng

A random number generator or seed for the introduced noise.

None

UpperBound

UpperBound(
    bound: float | None = None,
    strict: bool = False,
    eps: float = 0,
    rng: NpRng = None,
)

A class describing an upper bound.

Parameters:

Name Type Description Default
bound float | None

The numeric value of the bound. If None, it will be inferred from the data.

None
strict bool

Whether the bound is strict or not.

False
eps float

The amount of noise to add when the bound is inferred from the data.

0
rng NpRng

A random number generator or seed for the introduced noise.

None

NumCfg pydantic-config

NumCfg(
    lower_bound: LowerBound | float | None = None,
    upper_bound: UpperBound | float | None = None,
    multiple_of: float | None = None,
)

A configuration for the generation of a numerical variable.

Parameters:

Name Type Description Default
lower_bound LowerBound | float | None

The lower bound. If a numeric value, it is used to build a LowerBound object. If None, no lower bound is enforced.

None
upper_bound UpperBound | float | None

The upper bound. If a numeric value, it is used to build a UpperBound object. If None, no upper bound is enforced.

None
multiple_of float | None

The common denominator of the numerical values, if any.

None

Numeric

Numeric(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    cfg: NumCfg | None = None,
)

A preprocessor for columns containing numeric data.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
cfg NumCfg | None

The configuration for the numeric variable. By default, the lower and upper bounds are inferred from the data and enforced in generation.

None

Integer

Integer(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    cfg: NumCfg | None = None,
)

A preprocessor for columns containing integer data.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
cfg NumCfg | None

The configuration for the numeric variable. By default, the lower and upper bounds are inferred from the data and enforced in generation.

None

Coordinates

Coordinates(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    cfg_lat: NumCfg[float] | None = None,
    cfg_lon: NumCfg[float] | None = None,
)

A preprocessor for columns containing geographic coordinates.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
cfg_lat NumCfg[float] | None

The configuration for the numeric value of the latitude.

None
cfg_lon NumCfg[float] | None

The configuration for the numeric value of the longitude.

None

MinLength

MinLength(bound: int | None = None)

A class describing a lower bound on text length.

Parameters:

Name Type Description Default
bound int | None

The value of the bound. If None, it will be inferred from the data.

None

MaxLength

MaxLength(bound: int | None = None)

A class describing an upper bound on text length.

Parameters:

Name Type Description Default
bound int | None

The value of the bound. If None, it will be inferred from the data.

None

TextCfg pydantic-config

TextCfg(
    min_length: MinLength | int | None = None,
    max_length: MaxLength | int | None = None,
    pattern: str | None = None,
)

A configuration for the generation of free text.

Parameters:

Name Type Description Default
min_length MinLength | int | None

The minimum length. If an integer, it is used to build a MinLength object. If None, no minimum length is enforced.

None
max_length MaxLength | int | None

The maximum length. If an integer, it is used to build a MaxLength object. If None, no maximum length is enforced.

None
pattern str | None

Am optional regex for the text to conform to.

None

Text

Text(
    name: str | None = None,
    description: str | None = None,
    special_values: Iterable = (),
    protection: Protection | bool = Protection(),
    cfg: TextCfg | None = None,
)

A preprocessor for columns containing free text.

Parameters:

Name Type Description Default
name str | None

An optional name to be used in the JSON schema.

None
description str | None

An optional description to be used in the JSON schema.

None
special_values Iterable

A sequence of values to be handled separately as categories.

()
protection Protection | bool

The protection configuration, or a boolean values indicating whether to enable the default protection.

Protection()
cfg TextCfg | None

The configuration for the text. By default, the minimum and maximum lengths are inferred from the data and enforced in generation.

None