nip.code_validation.dataset_generation

nip.code_validation.dataset_generation#

Module for generating datasets used in code validation systems.

A code validation dataset is generated by taking the APPS dataset and modifying solutions to create buggy solutions, using language models.

A CodeValidationDatasetConfig class is provided to configure the generation of buggy solutions for a dataset of problems. The generate_and_save_cv_dataset function is used to generate buggy solutions for a given dataset of problems and save the combined dataset to disk.

Functions

_create_empty_cv_dataset()

Create an empty code validation dataset with the required columns.

_extract_code_and_input(model_output)

Extract the modified solution and problematic inputs from the model output.

_generate_buggy_solutions(results, datum, ...)

Generate buggy solutions for a given datum and append them to the result list.

_get_openai_response(model, messages[, ...])

Get completions from the OpenAI API for a chat model.

_get_openrouter_response(model, messages[, ...])

Send a POST request to the OpenRouter API to get responses from a chat model.

_load_cv_dataset(config, splits)

Load an existing code validation dataset or create an empty one.

_test_buggy_solution(buggy_solution, ...[, ...])

Test a buggy solution against a correct solution using provided inputs and datum.

_try_generate_buggy_solutions(datum, model, ...)

Generate buggy solutions by modifying a fraction of the provided solutions.

generate_and_save_cv_dataset(config[, manager])

Generate a code validation dataset and save it to disk.

Classes

CodeValidationDatasetConfig(model, ...)

A configuration class for generating datasets used in code validation systems.