create_cv_dataset.py#
Generate a Buggy APPS dataset.
Our Buggy APPS dataset is based on the APPS dataset [HBK+21], which consists of problem statements and code solutions. The Buggy APPS dataset is augmented with buggy code solutions, generated by asking a large language model to introduce a subtle bug into each correct solution.
scripts/create_cv_dataset.py#
Generate a Buggy APPS dataset
usage: scripts/create_cv_dataset.py [-h] [--split SPLIT] [--num_data NUM_DATA]
[--save_after SAVE_AFTER]
- -h, --help#
show this help message and exit
- --split <split>#
Whether to draw problems from the train or test split of the APPS dataset
- --num_data <num_data>#
How many problems the dataset should contain (per split per difficulty level)
- --save_after <save_after>#
The number of problems added after which to save (and possibly push) the dataset