nip.code_validation.dataset_generation.CodeValidationDatasetConfig#
- class nip.code_validation.dataset_generation.CodeValidationDatasetConfig(model: str = 'openai/gpt-4o-mini', difficulties: list[str] = <factory>, split: ~typing.Literal['train', 'test'] | None = None, fraction_to_modify: float = 0.5, max_modifications: int = 1, num_data: int | None = 10000, num_problematic_inputs: int = 0, system_prompt: str | None = None, max_attempts: int = 10, local_dir: str = '/home/runner/work/neural-interactive-proofs/neural-interactive-proofs/data/code_validation', pull_repo: str | None = 'lrhammond/buggy-apps', push_repo: str | None = 'lrhammond/buggy-apps', save_after: int | None = 10)[source]#
A configuration class for generating datasets used in code validation systems.
- difficulties#
List of difficulty levels, default is [“interview”, “competition”, “introductory”].
- split#
The data split, default is None.
- Type:
{‘train’, ‘test’}, optional
- num_data#
Number of data points per split per difficulty level, default is 10000.
- Type:
int, optional
- system_prompt#
System prompt for generating incorrect solutions, default is None.
- Type:
str, optional
- max_attempts#
Maximum number of attempts to generate a valid buggy solution, default is 10.
- Type:
- pull_repo#
Repository to pull data from, default is the value of
constants.HF_BUGGY_APPS_REPO
.- Type:
str | None
- push_repo#
Repository to push data to, default is the value of
constants.HF_BUGGY_APPS_REPO
.- Type:
str | None
- __post_init__():
Initializes the system prompt based on the number of problematic inputs.
Methods Summary
__eq__
(other)Return self==value.
__init__
([model, difficulties, split, ...])Set the system prompt based on the number of problematic inputs.
__repr__
()Return repr(self).
Attributes
Methods
- __eq__(other)#
Return self==value.
- __init__(model: str = 'openai/gpt-4o-mini', difficulties: list[str] = <factory>, split: ~typing.Literal['train', 'test'] | None = None, fraction_to_modify: float = 0.5, max_modifications: int = 1, num_data: int | None = 10000, num_problematic_inputs: int = 0, system_prompt: str | None = None, max_attempts: int = 10, local_dir: str = '/home/runner/work/neural-interactive-proofs/neural-interactive-proofs/data/code_validation', pull_repo: str | None = 'lrhammond/buggy-apps', push_repo: str | None = 'lrhammond/buggy-apps', save_after: int | None = 10) None #
- __post_init__()[source]#
Set the system prompt based on the number of problematic inputs.
This method sets the
system_prompt
attribute based on the value ofnum_problematic_inputs
. Ifsystem_prompt
is already provided, it does nothing. Otherwise, it generates a prompt for generating incorrect solutions and problematic inputs for a code validation system.If
num_problematic_inputs
is less than 0 or greater than 10, it raises a ValueError.If
num_problematic_inputs
is 0, it generates a prompt for modifying a solution without providing problematic inputs.If
num_problematic_inputs
is between 1 and 10, it generates a prompt for modifying a solution and provides placeholders for the specified number of problematic inputs.
- Raises:
ValueError – If
num_problematic_inputs
is not between 0 and 10.
- __repr__()#
Return repr(self).