

Module for generating datasets used in code validation systems.

A code validation dataset is generated by taking the APPS dataset and modifying solutions to create buggy solutions, using language models.

A CodeValidationDatasetConfig class is provided to configure the generation of buggy solutions for a dataset of problems. The generate_and_save_cv_dataset function is used to generate buggy solutions for a given dataset of problems and save the combined dataset to disk.



Create an empty code validation dataset with the required columns.


Extract the modified solution and problematic inputs from the model output.

_generate_buggy_solutions(results, datum, ...)

Generate buggy solutions for a given datum and append them to the result list.

_get_openai_response(model, messages[, ...])

Get completions from the OpenAI API for a chat model.

_get_openrouter_response(model, messages[, ...])

Send a POST request to the OpenRouter API to get responses from a chat model.

_load_cv_dataset(config, splits)

Load an existing code validation dataset or create an empty one.

_test_buggy_solution(buggy_solution, ...[, ...])

Test a buggy solution against a correct solution using provided inputs and datum.

_try_generate_buggy_solutions(datum, model, ...)

Generate buggy solutions by modifying a fraction of the provided solutions.

generate_and_save_cv_dataset(config[, manager])

Generate a code validation dataset and save it to disk.


CodeValidationDatasetConfig(model, ...)

A configuration class for generating datasets used in code validation systems.