nip.code_validation.dataset_generation.CodeValidationDatasetConfig#

class nip.code_validation.dataset_generation.CodeValidationDatasetConfig(model: str = 'openai/gpt-4o-mini', difficulties: list[str] = <factory>, split: ~typing.Literal['train', 'test'] | None = None, fraction_to_modify: float = 0.5, max_modifications: int = 1, num_data: int | None = 10000, num_problematic_inputs: int = 0, system_prompt: str | None = None, max_attempts: int = 10, local_dir: str = '/home/runner/work/neural-interactive-proofs/neural-interactive-proofs/data/code_validation', pull_repo: str | None = 'lrhammond/buggy-apps', push_repo: str | None = 'lrhammond/buggy-apps', save_after: int | None = 10)[source]#

A configuration class for generating datasets used in code validation systems.

model#

The model to be used, default is “openai/gpt-4o-mini”.

Type:

str

difficulties#

List of difficulty levels, default is [“interview”, “competition”, “introductory”].

Type:

list of str

split#

The data split, default is None.

Type:

{‘train’, ‘test’}, optional

fraction_to_modify#

Fraction of data to modify, default is 0.5.

Type:

float

max_modifications#

Maximum number of modifications, default is 1.

Type:

int

num_data#

Number of data points per split per difficulty level, default is 10000.

Type:

int, optional

num_problematic_inputs#

Number of problematic inputs to request, default is 0.

Type:

int

system_prompt#

System prompt for generating incorrect solutions, default is None.

Type:

str, optional

max_attempts#

Maximum number of attempts to generate a valid buggy solution, default is 10.

Type:

int

local_dir#

Local directory for data storage.

Type:

str

pull_repo#

Repository to pull data from, default is the value of constants.HF_BUGGY_APPS_REPO.

Type:

str | None

push_repo#

Repository to push data to, default is the value of constants.HF_BUGGY_APPS_REPO.

Type:

str | None

save_after#

Number of operations after which to save data, default is 10.

Type:

int, optional

__post_init__():

Initializes the system prompt based on the number of problematic inputs.

Methods Summary

__eq__(other)

Return self==value.

__init__([model, difficulties, split, ...])

__post_init__()

Set the system prompt based on the number of problematic inputs.

__repr__()

Return repr(self).

Attributes

Methods

__eq__(other)#

Return self==value.

__init__(model: str = 'openai/gpt-4o-mini', difficulties: list[str] = <factory>, split: ~typing.Literal['train', 'test'] | None = None, fraction_to_modify: float = 0.5, max_modifications: int = 1, num_data: int | None = 10000, num_problematic_inputs: int = 0, system_prompt: str | None = None, max_attempts: int = 10, local_dir: str = '/home/runner/work/neural-interactive-proofs/neural-interactive-proofs/data/code_validation', pull_repo: str | None = 'lrhammond/buggy-apps', push_repo: str | None = 'lrhammond/buggy-apps', save_after: int | None = 10) None#
__post_init__()[source]#

Set the system prompt based on the number of problematic inputs.

This method sets the system_prompt attribute based on the value of num_problematic_inputs. If system_prompt is already provided, it does nothing. Otherwise, it generates a prompt for generating incorrect solutions and problematic inputs for a code validation system.

  • If num_problematic_inputs is less than 0 or greater than 10, it raises a ValueError.

  • If num_problematic_inputs is 0, it generates a prompt for modifying a solution without providing problematic inputs.

  • If num_problematic_inputs is between 1 and 10, it generates a prompt for modifying a solution and provides placeholders for the specified number of problematic inputs.

Raises:

ValueError – If num_problematic_inputs is not between 0 and 10.

__repr__()#

Return repr(self).