LLM based models
The aindo.rdml.synth.llm
module leverages the capabilities of Large Language Models (LLMs) to
generate synthetic data and enhance structured datasets.
It is especially useful for both prototyping and production use cases, it enables:
- Safer Testing: Generate privacy-compliant mock data resembling production datasets.
- Data Augmentation: Enrich training data with synthetic examples to improve generalization.
- Data Normalization: Clean and standardize messy datasets or extract structured information from text fields.
For example, it is possible to generate a dataset of fictional individuals with names, ages, and emails, create a synthetic customer database with realistic email formats, or enrich an existing dataset by adding a "Salary Range" column inferred from job titles.
In the LLM script we provide a full end-to-end example using the
aindo.rdml.synth.llm
module to generate a new dataset from scratch,
and to later enrich it with some extra columns.