nip.code_validation.data.BuggyAppsCodeValidationDataset#

class nip.code_validation.data.BuggyAppsCodeValidationDataset(hyper_params: HyperParameters, settings: ExperimentSettings, protocol_handler: ProtocolHandler, split: Literal['train', 'test', 'validation'] = 'train')[source]#

An extension of the APPS [HBK+21] dataset with buggy solutions.

Buggy solutions were generated by asking GPT-4o to introduce bugs into the non-buggy solutions from the APPS dataset.

Methods Summary

`__getitem__`(index)
`__init__`(hyper_params, settings, ...[, split])
`__len__`()
`__repr__`()	Return repr(self).
`_load_raw_dataset`()	Load the dataset.
`_process_data`(raw_dataset)	Process the dataset.
`_reduce_dataset_size`(dataset)	Reduce the size of a dataset if necessary.

Attributes

`dataset_filepath_name`	The name of the dataset file.
`instance_keys`	The keys specifying the input instance.
`keys`	The keys (field names) in the dataset.
`max_test_size`	The maximum size of the test set.
`max_train_size`	The maximum size of the training set.
`processed_dir`	The path to the directory containing the processed data.
`raw_dir`	The path to the directory containing the raw data.
`reduce_shuffle_seed`	The seed used to shuffle the dataset before reducing its size.
`split_dir`	The name of the folder containing the split data.
`validation_proportion`	The proportion of the training set to use for validation.

Methods

__init__(hyper_params: HyperParameters, settings: ExperimentSettings, protocol_handler: ProtocolHandler, split: Literal['train', 'test', 'validation'] = 'train')[source]#

_load_raw_dataset() → Dataset[source]#

Load the dataset.

_process_data(raw_dataset: Dataset) → Dataset[source]#

Process the dataset.

Parameters:: raw_dataset (HuggingFaceDataset) – The unprocessed dataset.
Returns:: processed_dataset (HuggingFaceDataset) – The processed dataset.

_reduce_dataset_size(dataset: Dataset) → Dataset[source]#

Reduce the size of a dataset if necessary.

nip.code_validation.data.BuggyAppsCodeValidationDataset