Skip to content

Model

LlmColumnCfg dataclass

Column configuration (both context column or generated column).

Parameters:

Name Type Description Default
type Column

The type of the column.

required
description str

The description of the column.

required
structure ColumnStructure | None

The structure of the column. If None, a default column structure will be assigned based on the column type defined in the Schema.

None

LlmTableCfg dataclass

Table configuration.

Parameters:

Name Type Description Default
name str

The name of the table.

required
description str

The description of the dataset content.

required
columns dict[str, LlmColumnCfg]

A dictionary with the configuration for the columns of the table.

required

GenerationMode

Generation modes for the LLM model.

Supported types are:

  • STRUCTURED: the model will be forced to generate data that follows the desired structure. This option may affect the quality of the output data if the structure is too restrictive or incoherent.
  • REJECTION: the model will generate data without enforcing the structure and will reject samples that do not follow the desired structure.

LlmTabularModel

generate

generate(
    cfg: LlmTableCfg,
    n_samples: int,
    batch_size: int = 1,
    max_tokens: int = 60,
    generation_mode: GenerationMode | str = REJECTION,
    retry_on_fail: int = 10,
    temp: float = 1.0,
) -> RelationalData

Generate synthetic tabular data from scratch.

Parameters:

Name Type Description Default
cfg LlmTableCfg

The configuration of the table to generate.

required
n_samples int

Desired number of samples to generate.

required
batch_size int

Batch size used during generation. If 0, all data is generated in a single batch.

1
max_tokens int

Maximum number of tokens to generate for each sample.

60
generation_mode GenerationMode | str

Generation mode for the LLM model. Can be a GenerationMode, or a string representation of the latter (structured, rejection).

REJECTION
retry_on_fail int

Number of retry attempts in case of rejected generated samples.

10
temp float

Temperature parameter for sampling.

1.0

Returns:

Type Description
RelationalData

A RelationalData object with the generated synthetic tabular data.

add_columns

add_columns(
    data: RelationalData,
    context_cfg: LlmTableCfg,
    new_columns: dict[str, LlmColumnCfg],
    batch_size: int = 1,
    max_tokens: int = 60,
    generation_mode: GenerationMode | str = REJECTION,
    retry_on_fail: int = 10,
    temp: float = 1.0,
) -> RelationalData

Add new columns to an existing table.

Parameters:

Name Type Description Default
data RelationalData

The RelationalData object with the context data to which to add the new columns.

required
context_cfg LlmTableCfg

The configuration for the context table.

required
new_columns dict[str, LlmColumnCfg]

A dictionary with the configurations for the new columns to add.

required
batch_size int

Batch size used during generation. If 0, all data is generated in a single batch.

1
max_tokens int

Maximum number of tokens to generate for each sample.

60
generation_mode GenerationMode | str

Generation mode for the LLM model. Can be a GenerationMode, or a string representation of the latter (structured, rejection).

REJECTION
retry_on_fail int

Number of retries in case of rejected generated samples.

10
temp float

Temperature parameter for sampling.

1.0

Returns:

Type Description
RelationalData

A RelationalData object with the context data and the new columns added.

save

save(path: Path) -> None

Save the LlmTabularModel to a checkpoint at the given path.

Parameters:

Name Type Description Default
path Path

The path where to sve the checkpoint.

required

load classmethod

load(
    ckpt_path: Path | str,
    model_path: Path | str | None = None,
    auth_token: str | None = None,
) -> LlmTabularModel

Load a LlmTabularModel from the checkpoint at the given path.

Parameters:

Name Type Description Default
ckpt_path Path | str

The path to the adapter checkpoint (provided by Aindo).

required
model_path Path | str | None

The path to the base LLM model. If None, it will be downloaded from Hugging Face.

None
auth_token str | None

The authorization token for downloading the base model from Hugging Fase. Not necessary if the environment variable HF_TOKEN is set, or if the base model is loaded from a local checkpoint.

None

Returns:

Type Description
LlmTabularModel

The loaded LlmTabularModel object.