Datasets
TabularDataset
from_data
classmethod
from_data(
data: RelationalData,
preproc: TabularPreproc,
on_disk: bool = False,
path: Path | str | None = None,
max_block_size: int = 0,
) -> TabularDataset
Build a TabularDataset
from the input data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
RelationalData
|
A |
required |
preproc
|
TabularPreproc
|
A |
required |
on_disk
|
bool
|
Whether to save the processed data on disk. If True, during training the data will be loaded one batch at a time. This may slightly slow down the training, but will reduce the memory consumption. |
False
|
path
|
Path | str | None
|
The path to a directory where to save the processed data on disk. If None, it will be saved in a temporary directory. |
None
|
max_block_size
|
int
|
Maximum sequence length that will be used during training. It must be larger than the sequence length of each table, and therefore is available only for multi-table datasets. If 0, no maximum sequence length is imposed. |
0
|
Returns:
Type | Description |
---|---|
TabularDataset
|
A |
from_disk
classmethod
from_disk(
preproc: TabularPreproc,
path: Path | str,
max_block_size: int = 0,
) -> TabularDataset
Load a TabularDataset
from disk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preproc
|
TabularPreproc
|
The |
required |
path
|
Path | str
|
The path to the directory where the processed data is stored on disk. |
required |
max_block_size
|
int
|
Maximum sequence length that will be used during training. It must be larger than the sequence length of each table, and therefore is available only for multi-table datasets. If 0, no maximum sequence length is imposed. |
0
|
Returns:
Type | Description |
---|---|
TabularDataset
|
A |
TextDataset
from_data
classmethod
from_data(
data: RelationalData,
preproc: TextPreproc,
on_disk: bool = False,
path: Path | str | None = None,
) -> TextDataset
Build a TextDataset
from the input data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
RelationalData
|
A |
required |
preproc
|
TextPreproc
|
A |
required |
on_disk
|
bool
|
Whether to save the processed data on disk. If True, during training the data will be loaded one batch at a time. This may slightly slow down the training, but will reduce the memory consumption. |
False
|
path
|
Path | str | None
|
The path to a directory where to save the processed data on disk. If None, it will be saved in a temporary directory. |
None
|
Returns:
Type | Description |
---|---|
TextDataset
|
A |
from_disk
classmethod
from_disk(
preproc: TextPreproc, path: Path | str
) -> TextDataset
Load a TextDataset
from disk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preproc
|
TextPreproc
|
The |
required |
path
|
Path | str
|
The path to the directory where the processed data is stored on disk. |
required |
Returns:
Type | Description |
---|---|
TextDataset
|
A |