nip.code_validation.dataset_generation.CodeValidationDatasetConfig

nip.code_validation.dataset_generation.CodeValidationDatasetConfig#

class nip.code_validation.dataset_generation.CodeValidationDatasetConfig(model: str = 'openai/gpt-4o-mini', difficulties: list[str] = <factory>, split: ~typing.Literal['train', 'test'] | None = None, fraction_to_modify: float = 0.5, max_modifications: int = 1, num_data: int | None = 10000, num_problematic_inputs: int = 0, system_prompt: str | None = None, max_attempts: int = 10, local_dir: str = PosixPath('/home/runner/work/neural-interactive-proofs/neural-interactive-proofs/data/code_validation'), pull_repo: str | None = 'lrhammond/buggy-apps', push_repo: str | None = 'lrhammond/buggy-apps', save_after: int | None = 10)[source]#

A configuration class for generating datasets used in code validation systems.

model#

The model to be used, default is “openai/gpt-4o-mini”.

Type:: str

difficulties#

List of difficulty levels, default is [“interview”, “competition”, “introductory”].

Type:: list of str

split#

The data split, default is None.

Type:: {‘train’, ‘test’}, optional

fraction_to_modify#

Fraction of data to modify, default is 0.5.

Type:: float

max_modifications#

Maximum number of modifications, default is 1.

Type:: int

num_data#

Number of data points per split per difficulty level, default is 10000.

Type:: int, optional

num_problematic_inputs#

Number of problematic inputs to request, default is 0.

Type:: int

system_prompt#

System prompt for generating incorrect solutions, default is None.

Type:: str, optional

max_attempts#

Maximum number of attempts to generate a valid buggy solution, default is 10.

Type:: int

local_dir#

Local directory for data storage.

Type:: str

pull_repo#

Repository to pull data from, default is the value of constants.HF_BUGGY_APPS_REPO.

Type:: str | None

push_repo#

Repository to push data to, default is the value of constants.HF_BUGGY_APPS_REPO.

Type:: str | None

save_after#

Number of operations after which to save data, default is 10.

Type:: int, optional

__post_init__():: Initializes the system prompt based on the number of problematic inputs.

Methods Summary

`__eq__`(other)	Return self==value.
`__init__`([model, difficulties, split, ...])
`__post_init__`()	Set the system prompt based on the number of problematic inputs.
`__repr__`()	Return repr(self).

Attributes

`fraction_to_modify`
`local_dir`
`max_attempts`
`max_modifications`
`model`
`num_data`
`num_problematic_inputs`
`pull_repo`
`push_repo`
`save_after`
`split`
`system_prompt`
`difficulties`

Methods

__eq__(other)#: Return self==value.

__init__(model: str = 'openai/gpt-4o-mini', difficulties: list[str] = <factory>, split: ~typing.Literal['train', 'test'] | None = None, fraction_to_modify: float = 0.5, max_modifications: int = 1, num_data: int | None = 10000, num_problematic_inputs: int = 0, system_prompt: str | None = None, max_attempts: int = 10, local_dir: str = PosixPath('/home/runner/work/neural-interactive-proofs/neural-interactive-proofs/data/code_validation'), pull_repo: str | None = 'lrhammond/buggy-apps', push_repo: str | None = 'lrhammond/buggy-apps', save_after: int | None = 10) → None#

__post_init__()[source]#

Set the system prompt based on the number of problematic inputs.

This method sets the system_prompt attribute based on the value of num_problematic_inputs. If system_prompt is already provided, it does nothing. Otherwise, it generates a prompt for generating incorrect solutions and problematic inputs for a code validation system.

If num_problematic_inputs is less than 0 or greater than 10, it raises a ValueError.
If num_problematic_inputs is 0, it generates a prompt for modifying a solution without providing problematic inputs.
If num_problematic_inputs is between 1 and 10, it generates a prompt for modifying a solution and provides placeholders for the specified number of problematic inputs.

Raises:: ValueError – If num_problematic_inputs is not between 0 and 10.

__repr__()#: Return repr(self).

nip.code_validation.dataset_generation.CodeValidationDatasetConfig

Contents

nip.code_validation.dataset_generation.CodeValidationDatasetConfig#