Generate

Engine

Enumeration of the available backend engines.

Attributes:

Name	Type	Description
`VLLM`		vLLM backend engine.
`SGLANG`		SGLang backend engine.
`OPENAI`		OpenAI backend engine.
`OPENAI_BATCH`		OpenAI Batch backend engine.

BaseEngine

An abstract class for generation engines.

from_engine `classmethod`

from_engine(
    engine: Engine,
    model: str,
    cfg_filepath: str | Path | None = None,
    args: Sequence[str] = (),
    kwargs: dict[str, Any] | None = None,
) -> BaseEngine

Build the engine from an available engine enumeration.

Parameters:

Name	Type	Description	Default
`engine`	`Engine`	The engine enumeration.	required
`model`	`str`	The model name or path.	required
`cfg_filepath`	`str \| Path \| None`	If provided, the path were to save the generator configuration (keyword arguments).	`None`
`args`	`Sequence[str]`	Optional CLI arguments for the generator engine.	`()`
`kwargs`	`dict[str, Any] \| None`	Optional keyword arguments for the generator engine.	`None`

Returns:

Type	Description
`BaseEngine`	The initialized engine.

generate `abstractmethod`

generate(
    prompt_template: str | Callable[[GenPrompt], str],
    prompts: GenPrompt | Iterable[GenPrompt],
    n: int | Iterable[int] = 1,
    guided: bool = True,
    **kwargs: Any,
) -> list[list[str]]

Generate from the LLM.

Parameters:

Name	Type	Description	Default
`prompt_template`	`str \| Callable[[GenPrompt], str]`	The template for the prompt. If a string, it may contain as keys the fields of `GenPrompt`.	required
`prompts`	`GenPrompt \| Iterable[GenPrompt]`	The `GenPrompt` object(s) containing the prompt data.	required
`n`	`int \| Iterable[int]`	The number of samples for each prompt.	`1`
`guided`	`bool`	Whether to use guided generation.	`True`
`**kwargs`	`Any`	Keyword arguments for the sampling parameters.	`{}`

Returns:

Type	Description
`list[list[str]]`	A list of `n` generated texts for each prompt.

Generator

Generator(engine: BaseEngine)

Generate synthetic data via LLM.

Parameters:

Name	Type	Description	Default
`engine`	`BaseEngine`	The backend engine for generation.	required

from_engine `classmethod`

from_engine(
    engine: Engine | str,
    model: str,
    model_dir: Path | str | None = None,
    unsloth: bool = False,
    cfg_filepath: str | Path | None = None,
    args: Sequence[str] = (),
    kwargs: dict[str, Any] | None = None,
) -> Generator

Build the generator from an available backend engine.

Parameters:

Name	Type	Description	Default
`engine`	`Engine \| str`	The backend engine.	required
`model`	`str`	The model name or path.	required
`model_dir`	`Path \| str \| None`	The directory were to save the converted model.	`None`
`unsloth`	`bool`	Whether to use `unsloth` for merging the model.	`False`
`cfg_filepath`	`str \| Path \| None`	If provided, the path were to save the generator configuration (keyword arguments).	`None`
`args`	`Sequence[str]`	Optional CLI arguments for the generator engine.	`()`
`kwargs`	`dict[str, Any] \| None`	Optional keyword arguments for the generator engine.	`None`

Returns:

Type	Description
`Generator`	The generator.

generate

generate(
    prompt_template: str | Callable[[GenPrompt], str],
    prompts: GenPrompt | Iterable[GenPrompt],
    n: int | Iterable[int] = 1,
    guided: bool = True,
    rejection_filepath: Path | str | None = None,
    retry_on_fail: int = 100,
    **kwargs: Any,
) -> tuple[list[list[dict[str, Any]]], list[list[Invalid]]]

Generate valid output, using rejection sampling.

Parameters:

Name	Type	Description	Default
`prompt_template`	`str \| Callable[[GenPrompt], str]`	The template for the prompt. If a string, it may contain as keys the fields of `GenPrompt`.	required
`prompts`	`GenPrompt \| Iterable[GenPrompt]`	The `GenPrompt` object(s) containing the prompt data.	required
`n`	`int \| Iterable[int]`	The number of samples for each prompt.	`1`
`guided`	`bool`	Whether to use guided generation.	`True`
`rejection_filepath`	`Path \| str \| None`	If provided, a few statistic are saved on file, together with some valid and invalid (rejected) examples.	`None`
`retry_on_fail`	`int`	How many times to retry to generate valid output before interrupting.	`100`
`**kwargs`	`Any`	Keyword arguments for the engine sampling parameters.	`{}`

Returns:

Type	Description
`tuple[list[list[dict[str, Any]]], list[list[Invalid]]]`	A tuple of two lists, the first containing for each prompt the generated JSON outputs as dictionaries, and the second containing for each prompt a list of the `Invalid` objects containing the generated data that was rejected.

Invalid `dataclass`

A dataclass describing some invalid (rejected) generated data.

Attributes:

Name	Type	Description
`data`	`str`	The rejected output.
`error`	`Exception`	The exception received.

Generate

Engine

BaseEngine

from_engine classmethod

generate abstractmethod

Generator

from_engine classmethod

generate

Invalid dataclass

from_engine `classmethod`

generate `abstractmethod`

from_engine `classmethod`

Invalid `dataclass`