Skip to content

Relational data

RelationalData

__init__

__init__(data: Data, schema: Schema) -> None

Relational data structure.

Parameters:

Name Type Description Default
data Data

A dictionary with table names as keys and pandas.DataFrame's as values.

required
schema Schema

A Schema object.

required

split

split(
    ratio: float | int | dict[str, float | int],
    reset_index: bool = False,
    rng: NpRng = None,
) -> tuple[RelationalData, RelationalData]

Split the input data according to the given ratios for each root table.

Parameters:

Name Type Description Default
ratio float | int | dict[str, float | int]

Split ratio. If a float, it must be between 0 and 1. If an integer > 1, then it is interpreted as the number of rows. If a dictionary, it must contain a split ratio for each root table (except LUTs).

required
reset_index bool

Whether to reset the index of the resulting dataframes.

False
rng NpRng

Random state. If an int, it will be used as seed, if None the seed will be chosen randomly.

None

Returns:

Type Description
tuple[RelationalData, RelationalData]

Tuple with the two splits.

to_csv

to_csv(output_dir: Path | str, **kwargs: Any) -> None

Save the RelationalData tables into a directory as csv files.

Parameters:

Name Type Description Default
output_dir Path | str

Directory where to save the data.

required
**kwargs Any

Keyword arguments to be passed to pd.to_csv.

{}