Skip to content

Relational data

RelationalData

RelationalData(data: Data, schema: Schema)

Relational data structure.

Parameters:

Name Type Description Default
data Data

A dictionary with table names as keys and pandas.DataFrame's as values.

required
schema Schema

A Schema object.

required

select

select(
    idx: dict[str, Sequence[Sequence[int]]],
    reset_index: bool = False,
) -> list[RelationalData]

Select subsets of the input data according to the given indices for each root table.

Parameters:

Name Type Description Default
idx dict[str, Sequence[Sequence[int]]]

A dictionary containing for each root table (except LUTs) the indices relative to each selection.

required
reset_index bool

Whether to reset the index of the resulting dataframes.

False

Returns:

Type Description
list[RelationalData]

A list containing the selections.

split

split(
    ratio: float | int | dict[str, float | int],
    reset_index: bool = False,
    rng: NpRng = None,
) -> tuple[RelationalData, RelationalData]

Split the input data according to the given ratios for each root table.

Parameters:

Name Type Description Default
ratio float | int | dict[str, float | int]

Split ratio. If a float, it must be between 0 and 1. If an integer > 1, then it is interpreted as the number of rows. If a dictionary, it must contain a split ratio for each root table (except LUTs).

required
reset_index bool

Whether to reset the index of the resulting dataframes.

False
rng NpRng

Random state. If an int, it will be used as seed, if None the seed will be chosen randomly.

None

Returns:

Type Description
tuple[RelationalData, RelationalData]

A tuple with the two splits.

reset_keys

reset_keys(
    start: dict[str, int] | None = None,
    tables: Collection[str] | None = None,
) -> RelationalData

Reset the keys of the RelationalData object. Each resulting pd.DataFrame will have a unique, incremental integer index as its primary key. Primary keys of LookupTable's (and the foreign keys referring to them) are not changed.

Parameters:

Name Type Description Default
start dict[str, int] | None

A dictionary with the starting values for the primary key of each table. If a table is not present, the default is 0.

None
tables Collection[str] | None

The tables for which to reset the primary keys. The foreign keys in the other tables referring to this set of primary keys will be refactored accordingly. By default, all keys are reset.

None

Returns:

Type Description
RelationalData

The RelationalData with the reset keys.

concat classmethod

concat(
    *data: RelationalData,
    reset_keys: bool | Collection[str] = False,
) -> RelationalData

Concatenate several instances of RelationalData with the same Schema.

Parameters:

Name Type Description Default
*data RelationalData

The RelationalData objects to concatenate.

()
reset_keys bool | Collection[str]

Whether to reset the keys of the resulting pd.DataFrame's. It may be necessary if the datasets to concatenate have some common keys. It can be a boolean or a collection of tables for which to reset the keys.

False

Returns:

Type Description
RelationalData

The concatenated RelationalData.

to_csv

to_csv(output_dir: Path | str, **kwargs: Any) -> None

Save the RelationalData tables into a directory as csv files.

Parameters:

Name Type Description Default
output_dir Path | str

Directory where to save the data.

required
**kwargs Any

Keyword arguments to be passed to pd.to_csv.

{}