nip.code_validation.agents.OpenAiSharedModelGroup

nip.code_validation.agents.OpenAiSharedModelGroup#

class nip.code_validation.agents.OpenAiSharedModelGroup(hyper_params: HyperParameters, settings: ExperimentSettings, protocol_handler: ProtocolHandler | CodeValidationProtocolHandler, agent_wholes: dict[str, OpenAiWholeAgent], group_name: str)[source]#

A group of code validation OpenAI SDK agents sharing a model.

The OpenAI SDK can be used to interact with OpenAI’s API, as well as other APIs like OpenRouter.

Methods Summary

`__init__`(hyper_params, settings, ...)
`_get_lm_server_fine_tune_job`()	Get the fine-tune job from the language model server API.
`_get_openai_fine_tune_job`()	Get the fine-tune job from the OpenAI API.
`_make_fine_tune_api_call`(fine_tune_dataset, ...)	Make the API call to fine-tune the model.
`_make_fine_tune_api_call_lm_server`(...[, ...])	Make an API call to fine-tune the model using the language model server.
`_make_fine_tune_api_call_openai`(...[, job_name])	Make the OpenAI API call to fine-tune the model.
`agent_ids_and_names`()	Get an iterable of agent IDs and names.
`create_dpo_fine_tune_job`(...[, job_name])	Create a DPO fine-tune job for the agent group given sampled timesteps.
`create_supervised_fine_tune_job`(...[, ...])	Create a supervised fine-tune job for the agent.
`eval`()	Set the agent group to evaluation mode.
`fine_tune_job_failed`()	Check if the fine-tune job has failed.
`get_fine_tune_job_error_repr`()	Get a string representation of the error for the fine-tune job.
`get_fine_tune_job_status`()	Get the status of the fine-tune job.
`get_state`()	Get the state of the shared model group.
`get_state_dict`()	Get the state dictionary of the agent.
`set_state`(checkpoint)	Set the state of the shared model group from a checkpoint.
`switch_to_next_model`()	Switch to the next model after fine-tuning.
`train`()	Set the agent group to training mode.
`wait_for_ready`([timeout])	Wait for the agent group to be ready.

Attributes

`is_trainable`
`language_model_client`	The language model client for controlling the vLLM server.
`lora_alpha`	The computed LoRA alpha value for the group.
`max_message_rounds`	The maximum number of message rounds in the protocol.
`model_name`	The current model name, which may be the base model or a fine-tuned model.
`num_epochs`	The number of epochs to train the model for.
`openai_client`	The OpenAI client to use for interacting with the OpenAI API.
`rl_learning_rate`	The learning rate for this group when using reinforcement learning.
`training_client_type`	The type of client to use for training jobs.
`agent_wholes`

Methods

__init__(hyper_params: HyperParameters, settings: ExperimentSettings, protocol_handler: ProtocolHandler | CodeValidationProtocolHandler, agent_wholes: dict[str, OpenAiWholeAgent], group_name: str)[source]#

async _get_lm_server_fine_tune_job() → TrainingJobInfo[source]#

Get the fine-tune job from the language model server API.

Returns:: job (LmTrainingJobInfo) – The fine-tune job information.
Raises:: TrainingJobNotFoundClientError – If the fine-tune job ID is not set or the job does not exist.

async _get_openai_fine_tune_job() → FineTuningJob[source]#: Get the fine-tune job from the OpenAI API.

async _make_fine_tune_api_call(fine_tune_dataset: list[SupervisedDatasetItem] | list[DpoDatasetItem], method: Literal['supervised', 'dpo'], job_name: str | None = None)[source]#

Make the API call to fine-tune the model.

Parameters:

fine_tune_dataset (list[SupervisedDatasetItem] | list[DpoDatasetItem]) – The dataset of examples to fine-tune the model with.
method (Literal["supervised", "dpo"]) – The fine-tuning method to use.
job_name (str, optional) – A name for the job, to make it more easily identifiable.

async _make_fine_tune_api_call_lm_server(fine_tune_dataset: list[SupervisedDatasetItem] | list[DpoDatasetItem], method: Literal['dpo'], model_name: str, job_name: str | None = None)[source]#

Make an API call to fine-tune the model using the language model server.

Parameters:

fine_tune_dataset (list[SupervisedDatasetItem] | list[DpoDatasetItem]) – The dataset of examples to fine-tune the model with.
method (Literal["supervised", "dpo"]) – The fine-tuning method to use.
model_name (str) – The name of the model to fine-tune.
job_name (str, optional) – A name for the job, to make it more easily identifiable.

async _make_fine_tune_api_call_openai(fine_tune_dataset: list[SupervisedDatasetItem] | list[DpoDatasetItem], method: Literal['supervised', 'dpo'], model_name: str, job_name: str | None = None)[source]#

Make the OpenAI API call to fine-tune the model.

Parameters:

fine_tune_dataset (list[SupervisedDatasetItem] | list[DpoDatasetItem]) – The dataset of examples to fine-tune the model with.
method (Literal["supervised", "dpo"]) – The fine-tuning method to use.
model_name (str) – The name of the model to fine-tune.
job_name (str, optional) – A name for the job, to make it more easily identifiable.

agent_ids_and_names() → Iterable[tuple[int, str]][source]#

Get an iterable of agent IDs and names.

Yields:

agent_id (int) – The ID of the agent.
agent_name (str) – The name of the agent.

async create_dpo_fine_tune_job(positive_examples_per_agent: dict[str, NestedArrayDict], negative_examples_per_agent: dict[str, NestedArrayDict], job_name: str | None = None)[source]#

Create a DPO fine-tune job for the agent group given sampled timesteps.

This method generates a dataset of examples ready to pass to the fine-tune API.

Parameters:

positive_examples_per_agent (dict[str, NestedArrayDict]) – The next timestep in the preferred response for each of the timesteps in timesteps_per_agent. Each is a nested array dict with batch size (timestep, ) rather than the usual (batch, round), because we have selected timesteps from the first two dimensions of the batch.
negative_examples_per_agent (dict[str, NestedArrayDict]) – The next timestep in the non-preferred response for each of the timesteps in timesteps_per_agent. Each is a nested array dict with batch size (timestep, ) rather than the usual (batch, round), because we have selected timesteps from the first two dimensions of the batch.
job_name (str, optional) – A name for the job, to make it more easily identifiable.

async create_supervised_fine_tune_job(rollouts_per_agent: dict[str, NestedArrayDict], guess_replaced_rollouts: dict[str, NestedArrayDict] = {}, job_name: str | None = None)[source]#

Create a supervised fine-tune job for the agent.

This method generates a dataset of examples ready to pass to the fine-tune API.

Parameters:

rollouts_per_agent (dict[str, NestedArrayDict]) –
The sampled rollouts for each agent. Each is a nested dictionary of arrays with keys:
- ”round” (batch round): The current round number.
- ”message_history” (batch round round channel): The history of messages exchanged between the agents in each channel.
- ”message_agent_id” (batch round round channel): The id of the agent who messaged at a round-channel pair.
- ”raw_message_history” (batch round round agent): The raw message generated by each model in each timestep.
- ”question” (batch round): The problem text.
- ”solution” (batch round): The proposed solution text.
- ”y” (batch round): The true label (0 for incorrect, 1 for correct).
- ”prover_stance” (batch round): When randomizing the prover stance, the verdict that the prover is arguing for, where 0 means “reject” and 1 means “accept”.
guess_replaced_rollouts (dict[str, NestedArrayDict], default={}) – Additional rollouts for the verifier agents where the verifier’s guess is to be replaced with the true label. In these the verifier’s guess will be replaced with either ‘Decision: accept’ or ‘Decision: reject’ based on the true label.
job_name (str, optional) – A name for the job, to make it more easily identifiable.

async eval()[source]#: Set the agent group to evaluation mode.

async fine_tune_job_failed() → bool[source]#

Check if the fine-tune job has failed.

Returns:: failed (bool) – True if the fine-tune job has failed, False otherwise.

async get_fine_tune_job_error_repr() → str[source]#: Get a string representation of the error for the fine-tune job.

async get_fine_tune_job_status() → Literal['pending', 'running', 'succeeded', 'failed', 'cancelled', 'not_found'][source]#: Get the status of the fine-tune job.

get_state() → PureTextSharedModelGroupState[source]#: Get the state of the shared model group.

get_state_dict() → dict[source]#

Get the state dictionary of the agent.

Returns:: state_dict (dict) – The state dictionary of the agent.

set_state(checkpoint: OpenAiSharedModelGroupState | dict[str, Any])[source]#

Set the state of the shared model group from a checkpoint.

Parameters:: checkpoint (AgentCheckpoint) – The checkpoint to restore the state from.

async switch_to_next_model()[source]#: Switch to the next model after fine-tuning.

async train()[source]#

Set the agent group to training mode.

This method stops the vLLM server if it is running, as it is not needed during training and takes up resources.

async wait_for_ready(timeout: float = 300.0)[source]#

Wait for the agent group to be ready.

When using the language model server, this method will wait for it to start accepting requests, outputting a message to the log if it is not already running. It will then validate the server version to ensure it is compatible with the package version.

Parameters:: timeout (float, default=300.0) – The maximum time to wait for the agent group to be ready, in seconds.
Raises:: TimeoutError – If the agent group is not ready within the timeout period.

nip.code_validation.agents.OpenAiSharedModelGroup

Contents

nip.code_validation.agents.OpenAiSharedModelGroup#