nip.code_validation.data.CodeValidationDataset#

class nip.code_validation.data.CodeValidationDataset(hyper_params: HyperParameters, settings: ExperimentSettings, protocol_handler: ProtocolHandler, train: bool = True)[source]#

Base class for the code validation datasets.

Works with HuggingFace datasets.

The dataset should have the following columns:

  • “question”: The question text.

  • “solution”: The solution text.

  • “y”: The label, 1 for correct solutions and 0 for buggy solutions.

In addition, each datapoint should receive a “prover_stance” which is the verdict that the prover should be arguing for, in single-prover settings under the appropriate hyper-parameters. This can be computed from the (hash of the) solution text.

Methods Summary

__getitem__(index)

__init__(hyper_params, settings, ...[, train])

__len__()

__repr__()

Return repr(self).

_load_raw_dataset()

Load the dataset.

_process_data(raw_dataset)

Process the dataset.

Attributes

dataset_filepath_name

The name of the dataset file.

instance_keys

The keys specifying the input instance.

keys

The keys (field names) in the dataset.

processed_dir

The path to the directory containing the processed data.

raw_dir

The path to the directory containing the raw data.

Methods

__getitem__(index: Any) NestedArrayDict[source]#
__init__(hyper_params: HyperParameters, settings: ExperimentSettings, protocol_handler: ProtocolHandler, train: bool = True)[source]#
__len__() int[source]#
__repr__() str[source]#

Return repr(self).

abstract _load_raw_dataset() Dataset[source]#

Load the dataset.

Returns:

raw_data (HuggingFaceDataset) – The unprocessed dataset.

_process_data(raw_dataset: Dataset) Dataset[source]#

Process the dataset.

Parameters:

raw_dataset (HuggingFaceDataset) – The unprocessed dataset.

Returns:

processed_dataset (HuggingFaceDataset) – The processed dataset.