nip.code_validation.dataset_generation.generate_and_save_cv_dataset#
- nip.code_validation.dataset_generation.generate_and_save_cv_dataset(config: CodeValidationDatasetConfig | dict, manager: SyncManager | None = None)[source]#
Generate a code validation dataset and save it to disk.
This function generates buggy solutions for a given dataset of problems and saves the combined dataset to disk. It uses language models to generate buggy solutions for a fraction of the solutions provided in the dataset.
- Parameters:
config (CodeValidationDatasetConfig | dict) – A configuration object or dictionary for generating the dataset.
manager (SyncManager, optional) – A multiprocessing manager to handle shared memory, If None, a new manager is created.
- Raises:
ValueError – If the number of buggy solutions generated is 0.