Model
LlmColumnCfg
dataclass
Column configuration (both context column or generated column).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
type
|
Column
|
The type of the column. |
required |
description
|
str
|
The description of the column. |
required |
structure
|
ColumnStructure | None
|
The structure of the column. If None, a default column structure will be assigned based on the
column type defined in the |
None
|
LlmTableCfg
dataclass
Table configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the table. |
required |
description
|
str
|
The description of the dataset content. |
required |
columns
|
dict[str, LlmColumnCfg]
|
A dictionary with the configuration for the columns of the table. |
required |
GenerationMode
Generation modes for the LLM model.
Supported types are:
- STRUCTURED: the model will be forced to generate data that follows the desired structure. This option may affect the quality of the output data if the structure is too restrictive or incoherent.
- REJECTION: the model will generate data without enforcing the structure and will reject samples that do not follow the desired structure.
LlmTabularModel
generate
generate(
cfg: LlmTableCfg,
n_samples: int,
batch_size: int = 1,
max_tokens: int = 60,
generation_mode: GenerationMode | str = REJECTION,
retry_on_fail: int = 10,
temp: float = 1.0,
) -> RelationalData
Generate synthetic tabular data from scratch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cfg
|
LlmTableCfg
|
The configuration of the table to generate. |
required |
n_samples
|
int
|
Desired number of samples to generate. |
required |
batch_size
|
int
|
Batch size used during generation. If 0, all data is generated in a single batch. |
1
|
max_tokens
|
int
|
Maximum number of tokens to generate for each sample. |
60
|
generation_mode
|
GenerationMode | str
|
Generation mode for the LLM model. Can be a |
REJECTION
|
retry_on_fail
|
int
|
Number of retry attempts in case of rejected generated samples. |
10
|
temp
|
float
|
Temperature parameter for sampling. |
1.0
|
Returns:
Type | Description |
---|---|
RelationalData
|
A |
add_columns
add_columns(
data: RelationalData,
context_cfg: LlmTableCfg,
new_columns: dict[str, LlmColumnCfg],
batch_size: int = 1,
max_tokens: int = 60,
generation_mode: GenerationMode | str = REJECTION,
retry_on_fail: int = 10,
temp: float = 1.0,
) -> RelationalData
Add new columns to an existing table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
RelationalData
|
The |
required |
context_cfg
|
LlmTableCfg
|
The configuration for the context table. |
required |
new_columns
|
dict[str, LlmColumnCfg]
|
A dictionary with the configurations for the new columns to add. |
required |
batch_size
|
int
|
Batch size used during generation. If 0, all data is generated in a single batch. |
1
|
max_tokens
|
int
|
Maximum number of tokens to generate for each sample. |
60
|
generation_mode
|
GenerationMode | str
|
Generation mode for the LLM model. Can be a |
REJECTION
|
retry_on_fail
|
int
|
Number of retries in case of rejected generated samples. |
10
|
temp
|
float
|
Temperature parameter for sampling. |
1.0
|
Returns:
Type | Description |
---|---|
RelationalData
|
A |
save
save(path: Path) -> None
Save the LlmTabularModel
to a checkpoint at the given path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path
|
The path where to sve the checkpoint. |
required |
load
classmethod
load(
ckpt_path: Path | str,
model_path: Path | str | None = None,
auth_token: str | None = None,
) -> LlmTabularModel
Load a LlmTabularModel
from the checkpoint at the given path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ckpt_path
|
Path | str
|
The path to the adapter checkpoint (provided by Aindo). |
required |
model_path
|
Path | str | None
|
The path to the base LLM model. If None, it will be downloaded from Hugging Face. |
None
|
auth_token
|
str | None
|
The authorization token for downloading the base model from Hugging Fase.
Not necessary if the environment variable |
None
|
Returns:
Type | Description |
---|---|
LlmTabularModel
|
The loaded |