Skip to content

Datasets

TabularDataset

from_data classmethod

from_data(
    data: RelationalData,
    preproc: TabularPreproc,
    on_disk: bool = False,
    path: Path | str | None = None,
    max_block_size: int = 0,
) -> TabularDataset

Build a TabularDataset from the input data.

Parameters:

Name Type Description Default
data RelationalData

A RelationalData object with the data to be processed.

required
preproc TabularPreproc

A TabularPreproc to preprocess the data.

required
on_disk bool

Whether to save the processed data on disk. If True, during training the data will be loaded one batch at a time. This may slightly slow down the training, but will reduce the memory consumption.

False
path Path | str | None

The path to a directory where to save the processed data on disk. If None, it will be saved in a temporary directory.

None
max_block_size int

Maximum sequence length that will be used during training. It must be larger than the sequence length of each table, and therefore is available only for multi-table datasets. If 0, no maximum sequence length is imposed.

0

Returns:

Type Description
TabularDataset

A TabularDataset object.

from_disk classmethod

from_disk(
    preproc: TabularPreproc,
    path: Path | str,
    max_block_size: int = 0,
) -> TabularDataset

Load a TabularDataset from disk.

Parameters:

Name Type Description Default
preproc TabularPreproc

The TabularPreproc object used to preprocess the data.

required
path Path | str

The path to the directory where the processed data is stored on disk.

required
max_block_size int

Maximum sequence length that will be used during training. It must be larger than the sequence length of each table, and therefore is available only for multi-table datasets. If 0, no maximum sequence length is imposed.

0

Returns:

Type Description
TabularDataset

A TabularDataset object.

TextDataset

from_data classmethod

from_data(
    data: RelationalData,
    preproc: TextPreproc,
    on_disk: bool = False,
    path: Path | str | None = None,
) -> TextDataset

Build a TextDataset from the input data.

Parameters:

Name Type Description Default
data RelationalData

A RelationalData object with the data to be processed.

required
preproc TextPreproc

A TextPreproc to preprocess the data.

required
on_disk bool

Whether to save the processed data on disk. If True, during training the data will be loaded one batch at a time. This may slightly slow down the training, but will reduce the memory consumption.

False
path Path | str | None

The path to a directory where to save the processed data on disk. If None, it will be saved in a temporary directory.

None

Returns:

Type Description
TextDataset

A TextDataset object.

from_disk classmethod

from_disk(
    preproc: TextPreproc, path: Path | str
) -> TextDataset

Load a TextDataset from disk.

Parameters:

Name Type Description Default
preproc TextPreproc

The TextPreproc object used to preprocess the data.

required
path Path | str

The path to the directory where the processed data is stored on disk.

required

Returns:

Type Description
TextDataset

A TextDataset object.