Skip to content

Generate

Engine

Enumeration of the available backend engines.

Attributes:

Name Type Description
VLLM

vLLM backend engine.

SGLANG

SGLang backend engine.

OPENAI

OpenAI backend engine.

OPENAI_BATCH

OpenAI Batch backend engine.

BaseEngine

An abstract class for generation engines.

from_engine classmethod

from_engine(
    engine: Engine,
    model: str,
    cfg_filepath: str | Path | None = None,
    args: Sequence[str] = (),
    kwargs: dict[str, Any] | None = None,
) -> BaseEngine

Build the engine from an available engine enumeration.

Parameters:

Name Type Description Default
engine Engine

The engine enumeration.

required
model str

The model name or path.

required
cfg_filepath str | Path | None

If provided, the path were to save the generator configuration (keyword arguments).

None
args Sequence[str]

Optional CLI arguments for the generator engine.

()
kwargs dict[str, Any] | None

Optional keyword arguments for the generator engine.

None

Returns:

Type Description
BaseEngine

The initialized engine.

generate abstractmethod

generate(
    prompt_template: str | Callable[[GenPrompt], str],
    prompts: GenPrompt | Iterable[GenPrompt],
    n: int | Iterable[int] = 1,
    guided: bool = True,
    **kwargs: Any,
) -> list[list[str]]

Generate from the LLM.

Parameters:

Name Type Description Default
prompt_template str | Callable[[GenPrompt], str]

The template for the prompt. If a string, it may contain as keys the fields of GenPrompt.

required
prompts GenPrompt | Iterable[GenPrompt]

The GenPrompt object(s) containing the prompt data.

required
n int | Iterable[int]

The number of samples for each prompt.

1
guided bool

Whether to use guided generation.

True
**kwargs Any

Keyword arguments for the sampling parameters.

{}

Returns:

Type Description
list[list[str]]

A list of n generated texts for each prompt.

Generator

Generator(engine: BaseEngine)

Generate synthetic data via LLM.

Parameters:

Name Type Description Default
engine BaseEngine

The backend engine for generation.

required

from_engine classmethod

from_engine(
    engine: Engine | str,
    model: str,
    model_dir: Path | str | None = None,
    unsloth: bool = False,
    cfg_filepath: str | Path | None = None,
    args: Sequence[str] = (),
    kwargs: dict[str, Any] | None = None,
) -> Generator

Build the generator from an available backend engine.

Parameters:

Name Type Description Default
engine Engine | str

The backend engine.

required
model str

The model name or path.

required
model_dir Path | str | None

The directory were to save the converted model.

None
unsloth bool

Whether to use unsloth for merging the model.

False
cfg_filepath str | Path | None

If provided, the path were to save the generator configuration (keyword arguments).

None
args Sequence[str]

Optional CLI arguments for the generator engine.

()
kwargs dict[str, Any] | None

Optional keyword arguments for the generator engine.

None

Returns:

Type Description
Generator

The generator.

generate

generate(
    prompt_template: str | Callable[[GenPrompt], str],
    prompts: GenPrompt | Iterable[GenPrompt],
    n: int | Iterable[int] = 1,
    guided: bool = True,
    rejection_filepath: Path | str | None = None,
    retry_on_fail: int = 100,
    **kwargs: Any,
) -> tuple[list[list[dict[str, Any]]], list[list[Invalid]]]

Generate valid output, using rejection sampling.

Parameters:

Name Type Description Default
prompt_template str | Callable[[GenPrompt], str]

The template for the prompt. If a string, it may contain as keys the fields of GenPrompt.

required
prompts GenPrompt | Iterable[GenPrompt]

The GenPrompt object(s) containing the prompt data.

required
n int | Iterable[int]

The number of samples for each prompt.

1
guided bool

Whether to use guided generation.

True
rejection_filepath Path | str | None

If provided, a few statistic are saved on file, together with some valid and invalid (rejected) examples.

None
retry_on_fail int

How many times to retry to generate valid output before interrupting.

100
**kwargs Any

Keyword arguments for the engine sampling parameters.

{}

Returns:

Type Description
tuple[list[list[dict[str, Any]]], list[list[Invalid]]]

A tuple of two lists, the first containing for each prompt the generated JSON outputs as dictionaries, and the second containing for each prompt a list of the Invalid objects containing the generated data that was rejected.

Invalid dataclass

A dataclass describing some invalid (rejected) generated data.

Attributes:

Name Type Description
data str

The rejected output.

error Exception

The exception received.