nip.code_validation.data.BuggyAppsCodeValidationDataset#
- class nip.code_validation.data.BuggyAppsCodeValidationDataset(hyper_params: HyperParameters, settings: ExperimentSettings, protocol_handler: ProtocolHandler, split: Literal['train', 'test', 'validation'] = 'train')[source]#
An extension of the APPS [HBK+21] dataset with buggy solutions.
Buggy solutions were generated by asking GPT-4o to introduce bugs into the non-buggy solutions from the APPS dataset.
Methods Summary
__getitem__
(index)__init__
(hyper_params, settings, ...[, split])__len__
()__repr__
()Return repr(self).
Load the dataset.
_process_data
(raw_dataset)Process the dataset.
_reduce_dataset_size
(dataset)Reduce the size of a dataset if necessary.
Attributes
dataset_filepath_name
The name of the dataset file.
instance_keys
The keys specifying the input instance.
keys
The keys (field names) in the dataset.
max_test_size
The maximum size of the test set.
max_train_size
The maximum size of the training set.
processed_dir
The path to the directory containing the processed data.
raw_dir
The path to the directory containing the raw data.
reduce_shuffle_seed
The seed used to shuffle the dataset before reducing its size.
split_dir
The name of the folder containing the split data.
validation_proportion
The proportion of the training set to use for validation.
Methods
- __getitem__(index: Any) NestedArrayDict [source]#
- __init__(hyper_params: HyperParameters, settings: ExperimentSettings, protocol_handler: ProtocolHandler, split: Literal['train', 'test', 'validation'] = 'train')[source]#
- _load_raw_dataset() Dataset [source]#
Load the dataset.
- Returns:
raw_data (HuggingFaceDataset) – The unprocessed dataset.