Skip to content

Column structures

CategoricalColumnStructure

__init__

__init__(
    categories: Sequence[str] | None = None,
    null_values: Sequence = (),
) -> None

Structure for a categorical column.

Parameters:

Name Type Description Default
categories Sequence[str] | None

Categories of the column. If None, the column needs to be fitted.

None
null_values Sequence

Values to be considered as null.

()

fit

fit(x: Series) -> None

Fit the column structure on the input data.

Parameters:

Name Type Description Default
x Series

The column data to fit on.

required

IntegerColumnStructure

__init__

__init__(
    min_digits: int | None = None,
    max_digits: int | None = None,
    positive: bool = False,
    null_values: Sequence = (),
) -> None

Structure for a column that contains integer values.

Parameters:

Name Type Description Default
min_digits int | None

The minimum number of digits.

None
max_digits int | None

The maximum number of digits.

None
positive bool

Whether the values are positive.

False
null_values Sequence

Values to be considered as null.

()

FloatColumnStructure

__init__

__init__(
    min_int_digits: int | None = None,
    max_int_digits: int | None = None,
    min_decimal_digits: int | None = None,
    max_decimal_digits: int | None = None,
    positive: bool = False,
    has_ints: bool = False,
    null_values: Sequence = (),
) -> None

Structure for a column that contains float values.

Parameters:

Name Type Description Default
min_int_digits int | None

The minimum number of digits in the integer part.

None
max_int_digits int | None

The maximum number of digits in the integer part.

None
min_decimal_digits int | None

The minimum number of digits in the decimal part.

None
max_decimal_digits int | None

The maximum number of digits in the decimal part.

None
positive bool

Whether the values are positive.

False
has_ints bool

Whether the values have an integer part.

False
null_values Sequence

Values to be considered as null.

()

fit

fit(x: Series) -> None

Fit the column structure on the input data.

Parameters:

Name Type Description Default
x Series

The column data to fit on.

required

WordSequenceColumnStructure

__init__

__init__(
    min_word_len: int = 0,
    max_word_len: int = 100,
    min_n_words: int = 1,
    max_n_words: int = 15,
    allow_digits: bool = False,
    null_values: Sequence = (),
) -> None

Structure for a column that contains a sequence of words.

Parameters:

Name Type Description Default
min_word_len int

The minimum length of a word.

0
max_word_len int

The maximum length of a word.

100
min_n_words int

The minimum number of words.

1
max_n_words int

The maximum number of words.

15
allow_digits bool

Whether digits are allowed in the words.

False
null_values Sequence

Values to be considered as null.

()

fit

fit(x: Series) -> None

Fit the column structure on the input data.

Parameters:

Name Type Description Default
x Series

The column data to fit on.

required

CustomRegexColumnStructure

__init__

__init__(regex: str, null_values: Sequence = ()) -> None

Structure for a column that contains values that match a custom regex.

Parameters:

Name Type Description Default
regex str

The regex that the values must match.

required
null_values Sequence

Values to be considered as null

()

fit

fit(x: Series) -> None

Fit the column structure on the input data.

Parameters:

Name Type Description Default
x Series

The column data to fit on.

required