Generate
Engine
Enumeration of the available backend engines.
Attributes:
| Name | Type | Description |
|---|---|---|
VLLM |
vLLM backend engine. |
|
SGLANG |
SGLang backend engine. |
|
OPENAI |
OpenAI backend engine. |
|
OPENAI_BATCH |
OpenAI Batch backend engine. |
BaseEngine
An abstract class for generation engines.
from_engine
classmethod
from_engine(
engine: Engine,
model: str,
cfg_filepath: str | Path | None = None,
args: Sequence[str] = (),
kwargs: dict[str, Any] | None = None,
) -> BaseEngine
Build the engine from an available engine enumeration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
engine
|
Engine
|
The engine enumeration. |
required |
model
|
str
|
The model name or path. |
required |
cfg_filepath
|
str | Path | None
|
If provided, the path were to save the generator configuration (keyword arguments). |
None
|
args
|
Sequence[str]
|
Optional CLI arguments for the generator engine. |
()
|
kwargs
|
dict[str, Any] | None
|
Optional keyword arguments for the generator engine. |
None
|
Returns:
| Type | Description |
|---|---|
BaseEngine
|
The initialized engine. |
generate
abstractmethod
generate(
prompt_template: str | Callable[[GenPrompt], str],
prompts: GenPrompt | Iterable[GenPrompt],
n: int | Iterable[int] = 1,
guided: bool = True,
**kwargs: Any,
) -> list[list[str]]
Generate from the LLM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt_template
|
str | Callable[[GenPrompt], str]
|
The template for the prompt. If a string, it may contain as keys the fields of |
required |
prompts
|
GenPrompt | Iterable[GenPrompt]
|
The |
required |
n
|
int | Iterable[int]
|
The number of samples for each prompt. |
1
|
guided
|
bool
|
Whether to use guided generation. |
True
|
**kwargs
|
Any
|
Keyword arguments for the sampling parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[list[str]]
|
A list of |
Generator
Generator(engine: BaseEngine)
Generate synthetic data via LLM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
engine
|
BaseEngine
|
The backend engine for generation. |
required |
from_engine
classmethod
from_engine(
engine: Engine | str,
model: str,
model_dir: Path | str | None = None,
unsloth: bool = False,
cfg_filepath: str | Path | None = None,
args: Sequence[str] = (),
kwargs: dict[str, Any] | None = None,
) -> Generator
Build the generator from an available backend engine.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
engine
|
Engine | str
|
The backend engine. |
required |
model
|
str
|
The model name or path. |
required |
model_dir
|
Path | str | None
|
The directory were to save the converted model. |
None
|
unsloth
|
bool
|
Whether to use |
False
|
cfg_filepath
|
str | Path | None
|
If provided, the path were to save the generator configuration (keyword arguments). |
None
|
args
|
Sequence[str]
|
Optional CLI arguments for the generator engine. |
()
|
kwargs
|
dict[str, Any] | None
|
Optional keyword arguments for the generator engine. |
None
|
Returns:
| Type | Description |
|---|---|
Generator
|
The generator. |
generate
generate(
prompt_template: str | Callable[[GenPrompt], str],
prompts: GenPrompt | Iterable[GenPrompt],
n: int | Iterable[int] = 1,
guided: bool = True,
rejection_filepath: Path | str | None = None,
retry_on_fail: int = 100,
**kwargs: Any,
) -> tuple[list[list[dict[str, Any]]], list[list[Invalid]]]
Generate valid output, using rejection sampling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt_template
|
str | Callable[[GenPrompt], str]
|
The template for the prompt. If a string, it may contain as keys the fields of |
required |
prompts
|
GenPrompt | Iterable[GenPrompt]
|
The |
required |
n
|
int | Iterable[int]
|
The number of samples for each prompt. |
1
|
guided
|
bool
|
Whether to use guided generation. |
True
|
rejection_filepath
|
Path | str | None
|
If provided, a few statistic are saved on file, together with some valid and invalid (rejected) examples. |
None
|
retry_on_fail
|
int
|
How many times to retry to generate valid output before interrupting. |
100
|
**kwargs
|
Any
|
Keyword arguments for the engine sampling parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
tuple[list[list[dict[str, Any]]], list[list[Invalid]]]
|
A tuple of two lists, the first containing for each prompt the generated JSON outputs as dictionaries,
and the second containing for each prompt a list of the |