nip.code_validation.data.BuggyAppsCodeValidationDataset#

class nip.code_validation.data.BuggyAppsCodeValidationDataset(hyper_params: HyperParameters, settings: ExperimentSettings, protocol_handler: ProtocolHandler, split: Literal['train', 'test', 'validation'] = 'train')[source]#

An extension of the APPS [HBK+21] dataset with buggy solutions.

Buggy solutions were generated by asking GPT-4o to introduce bugs into the non-buggy solutions from the APPS dataset.

Methods Summary

__getitem__(index)

__init__(hyper_params, settings, ...[, split])

__len__()

__repr__()

Return repr(self).

_load_raw_dataset()

Load the dataset.

_process_data(raw_dataset)

Process the dataset.

_reduce_dataset_size(dataset)

Reduce the size of a dataset if necessary.

Attributes

dataset_filepath_name

The name of the dataset file.

instance_keys

The keys specifying the input instance.

keys

The keys (field names) in the dataset.

max_test_size

The maximum size of the test set.

max_train_size

The maximum size of the training set.

processed_dir

The path to the directory containing the processed data.

raw_dir

The path to the directory containing the raw data.

reduce_shuffle_seed

The seed used to shuffle the dataset before reducing its size.

split_dir

The name of the folder containing the split data.

validation_proportion

The proportion of the training set to use for validation.

Methods

__getitem__(index: Any) NestedArrayDict[source]#
__init__(hyper_params: HyperParameters, settings: ExperimentSettings, protocol_handler: ProtocolHandler, split: Literal['train', 'test', 'validation'] = 'train')[source]#
__len__() int[source]#
__repr__() str[source]#

Return repr(self).

_load_raw_dataset() Dataset[source]#

Load the dataset.

Returns:

raw_data (HuggingFaceDataset) – The unprocessed dataset.

_process_data(raw_dataset: Dataset) Dataset[source]#

Process the dataset.

Parameters:

raw_dataset (HuggingFaceDataset) – The unprocessed dataset.

Returns:

processed_dataset (HuggingFaceDataset) – The processed dataset.

_reduce_dataset_size(dataset: Dataset) Dataset[source]#

Reduce the size of a dataset if necessary.

Parameters:

dataset (HuggingFaceDataset) – The dataset to reduce.

Returns:

reduced_dataset (HuggingFaceDataset) – The reduced dataset.