Introduction
In the following sections, we provide an in-depth discussion of how to effectively use the classes and functions in each module to meet the specific needs of the user.
We begin by demonstrating how users can define the schema of a relational data structure,
including column types, tables, and relationships between tables (such as primary and foreign keys),
and how to load data according to these defined relational structures.
This is achieved through the aindo.rdml.relational
module.
Next, we focus on generating synthetic data using the neural generative models
found in the aindo.rdml.relational
module.
We start with a detailed explanation of how to preprocess the data,
then move on to show how to build and train the generative models.
Different models are employed to generate the structured portion of the tabular data as well as the text columns.
Additionally, we demonstrate how the tabular models can be used for predictive tasks
on relational data.
In addition to the neural generative models, the aindo.rdml.synth
module also includes
a tree-based model using XGBoost,
designed for handling the case of a single table.
In the following section, we introduce the aindo.rdml.synth.llm
module, which contains
a set of models that leverage the power of Large Language Models (LLMs) to generate synthetic data.
These models complement the other neural models available in the aindo.rdml.synth
module
by enabling the generation of entirely new data.
For example, they can be used to:
- Generate a dataset from scratch based on a given input structure.
- Enrich an existing dataset by adding new columns that are consistent with the existing data.
Finally, we will discuss how to use the functions in the aindo.rdml.eval
module
to evaluate the quality of the generated data, focusing on both similarity and privacy protection.