nip.code_validation.dataset_generation._test_buggy_solution#
- nip.code_validation.dataset_generation._test_buggy_solution(buggy_solution: str, solution: str, problematic_inputs: list[str], datum: dict, ignore_invalid_outputs: bool = False) tuple[bool, list, list[dict[Literal['input', 'output', 'buggy_output'], str]] | None] [source]#
Test a buggy solution against a correct solution using provided inputs and datum.
- Parameters:
buggy_solution (str) – The buggy solution code to be tested.
solution (str) – The correct solution code for comparison.
problematic_inputs (list of str) – A list (possibly empty) of inputs that are predicted to cause issues.
datum (dict) – The corresponding datum from the original APPS dataset.
ignore_invalid_outputs (bool, optional) – If True, ignores invalid outputs when checking correctness. Defaults to False.
- Returns:
incorrect (bool) – A boolean indicating if the buggy solution is incorrect.
results (list) – A list of results from the correctness checks.
mismatched (list[dict[Literal[“input”, “output”, “buggy_output”], str]] | None) – A list of mismatched input-output pairs for the problematic_inputs if produced, otherwise None.