nip.parameters.agents.CodeValidationAgentParameters#

class nip.parameters.agents.CodeValidationAgentParameters(agent_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, update_schedule: ~nip.parameters.update_schedule.AgentUpdateSchedule = ConstantUpdateSchedule(), use_manual_architecture: bool = False, normalize_message_history: bool = False, load_checkpoint_and_parameters: bool = False, checkpoint_entity: str = <factory>, checkpoint_project: str = <factory>, checkpoint_run_id: str | None = None, checkpoint_version: str = 'latest', use_orthogonal_initialisation: bool = True, orthogonal_initialisation_gain: float = 1.0, model_provider: ~typing.Literal['OpenAI', 'SelfHosted', 'OpenRouter'] = 'OpenAI', model_name: str = 'gpt-4o-mini-2024-07-18', language_model_server_scheme_host: str = 'http://localhost', language_model_server_port: int = 5000, vllm_server_port: int = 8000, use_dummy_api: bool = False, shared_model_group: str | None = None, temperature: float | None = None, top_p: float | None = None, repetition_penalty: float | None = None, fine_tune_from_scratch: bool = True, freeze_agent: bool = False, quantization: ~typing.Literal['bitsandbytes', 'none'] = 'none', num_epochs: int | None = None, dpo_beta: float | None = None, use_lora: bool = True, lora_rank: int = 64, lora_alpha: int | None = None, lora_alpha_scale: float | None = 1.0, lora_dropout: float = 0.05, stack_lora_adapters: bool = False, per_device_train_batch_size: int = 2, system_prompt_template_path: str | None = None, use_supervisor_message: ~typing.Literal['none', 'all', 'first', 'all_but_first'] = 'none', supervisor_name: str = 'Supervisor', max_response_words: int = 150, max_tokens_per_message: int | None = None, num_invalid_generation_retries: int = 20)[source]#

Additional parameters for agents in the code validation experiment.

Parameters:

model_provider (Literal["OpenAI", "SelfHosted", "OpenRouter"]) – The provider of the model and API to use.
model_name (str) – The name of the model to use.
vllm_openai_base_url (str) – When using vLLM’s OpenAI-compatible server, this is the URL of the server
use_dummy_api (bool) – Whether to use a dummy API instead of the real API. This is useful for testing the agent without making real API requests.
shared_model_group (str | None) – The group of agents which share the same model. When two agents share this value, they will use the same model inference. For fine-tuning, this model is trained on a copy of the rollouts and rewards for each agent in the group. When this is None, the agent is in a group whose name is the same as the agent’s name.
temperature (float | None) – The temperature to use when sampling from the model. If None, the model uses the default temperature. Only one of temperature and top_p should be set.
top_p (float | None) – The top-p value to use when sampling from the model. A value 0.1 means only the top 10% of tokens are considered when sampling. If None, the model uses the default top-p value. Only one of temperature and top_p should be set.
repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens. Not all models support this parameter.
fine_tune_from_scratch (bool) – Whether to fine-tune the model from scratch each iteration, or continue fine-tuning from the previous iteration.
freeze_agent (bool) – Whether to freeze the agent (i.e. not fine-tune it).
dpo_beta (float | None) – The beta parameter for to use when training the model with DPO. This is a float between 0 and 2, which controls how strictly the new model will adhere to its previous behaviour. If None, the value is configured by the model provider.
system_prompt_template_path (str | None) – This option allows specifying a custom system prompt template. If not provided, the default system prompt template is used.
use_supervisor_message (UseSupervisorType) – Whether and when to use a ‘supervisor’ message when generating responses. This is a message which is appended to the chat history, with instructions for the model. These instructions are already included in the system prompt, but this can help improve the quality of the generated responses. Some models also require at least one user message to be able to generate a response, and this can be used to work around that. The options are listed in the UseSupervisorType enum, and specify when to use the supervisor message.
supervisor_name (str) – The name of the user who sends the supervisor message.
max_response_words (int) – In the system prompt, we say that the agent should respond with a message of at most this many words.
max_tokens_per_message (int | None) – The maximum number of tokens which the model is allowed to generate in a single message. If None, this is calculated based on the max_response_words.
num_invalid_generation_retries (int) – The number of times to retry generating a message if the model returns an invalid response.

Methods Summary

`__eq__`(other)	Return self==value.
`__init__`([agent_lr_factor, body_lr_factor, ...])
`__post_init__`()
`__repr__`()	Return repr(self).
`_get_param_class_from_dict`(param_dict)	Try to get the parameter class from a dictionary of serialised parameters.
`construct_test_params`()	Construct test parameters for the agent.
`from_dict`(params_dict[, ignore_extra_keys])	Create a parameters object from a dictionary.
`get`(address)	Get a value from the parameters object using a dot-separated address.
`load_from_wandb_config`(wandb_config)	Load the parameters from a W&B config dictionary.
`to_dict`()	Convert the parameters object to a dictionary.

Attributes

`LOAD_PRESERVED_PARAMETERS`
`agent_lr_factor`
`body_lr_factor`
`checkpoint_run_id`
`checkpoint_version`
`dpo_beta`
`fine_tune_from_scratch`
`freeze_agent`
`is_random`
`language_model_server_port`
`language_model_server_scheme_host`
`load_checkpoint_and_parameters`
`lora_alpha`
`lora_alpha_scale`
`lora_dropout`
`lora_rank`
`max_response_words`
`max_tokens_per_message`
`model_name`
`model_provider`
`normalize_message_history`
`num_epochs`
`num_invalid_generation_retries`
`orthogonal_initialisation_gain`
`per_device_train_batch_size`
`quantization`
`repetition_penalty`
`shared_model_group`
`stack_lora_adapters`
`supervisor_name`
`system_prompt_template_path`
`temperature`
`top_p`
`update_schedule`
`use_dummy_api`
`use_lora`
`use_manual_architecture`
`use_orthogonal_initialisation`
`use_supervisor_message`
`vllm_server_port`
`checkpoint_entity`
`checkpoint_project`

Methods

__eq__(other)#: Return self==value.

__init__(agent_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, update_schedule: ~nip.parameters.update_schedule.AgentUpdateSchedule = ConstantUpdateSchedule(), use_manual_architecture: bool = False, normalize_message_history: bool = False, load_checkpoint_and_parameters: bool = False, checkpoint_entity: str = <factory>, checkpoint_project: str = <factory>, checkpoint_run_id: str | None = None, checkpoint_version: str = 'latest', use_orthogonal_initialisation: bool = True, orthogonal_initialisation_gain: float = 1.0, model_provider: ~typing.Literal['OpenAI', 'SelfHosted', 'OpenRouter'] = 'OpenAI', model_name: str = 'gpt-4o-mini-2024-07-18', language_model_server_scheme_host: str = 'http://localhost', language_model_server_port: int = 5000, vllm_server_port: int = 8000, use_dummy_api: bool = False, shared_model_group: str | None = None, temperature: float | None = None, top_p: float | None = None, repetition_penalty: float | None = None, fine_tune_from_scratch: bool = True, freeze_agent: bool = False, quantization: ~typing.Literal['bitsandbytes', 'none'] = 'none', num_epochs: int | None = None, dpo_beta: float | None = None, use_lora: bool = True, lora_rank: int = 64, lora_alpha: int | None = None, lora_alpha_scale: float | None = 1.0, lora_dropout: float = 0.05, stack_lora_adapters: bool = False, per_device_train_batch_size: int = 2, system_prompt_template_path: str | None = None, use_supervisor_message: ~typing.Literal['none', 'all', 'first', 'all_but_first'] = 'none', supervisor_name: str = 'Supervisor', max_response_words: int = 150, max_tokens_per_message: int | None = None, num_invalid_generation_retries: int = 20) → None#

__post_init__()[source]#

__repr__()#: Return repr(self).

classmethod _get_param_class_from_dict(param_dict: dict) → type[ParameterValue] | None[source]#

Try to get the parameter class from a dictionary of serialised parameters.

Parameters:: param_dict (dict) – A dictionary of parameters, which may have come from a to_dict method. This dictionary may contain a _type key, which is used to determine the class of the parameter.
Returns:: param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.
Raises:: ValueError – If the class specified in the dictionary is not a valid parameter class.

classmethod construct_test_params() → PureTextAgentParameters[source]#

Construct test parameters for the agent.

For this agent, we use the dummy API, so that we don’t need to make real API requests.

Returns:: test_params (PureTextAgentParameters) – The test parameters.

classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) → AgentsParameters[source]#

Create a parameters object from a dictionary.

Parameters:

params_dict (dict) – A dictionary of the parameters.
ignore_extra_keys (bool, default=False) – If True, ignore keys in the dictionary that do not correspond to fields in the parameters object.

Returns:

hyper_params (AgentsParameters) – The parameters object.

get(address: str) → Any[source]#

Get a value from the parameters object using a dot-separated address.

Parameters:: address (str) – The path to the value in the parameters object, separated by dots.
Returns:: value (Any) – The value at the address.
Raises:: KeyError – If the address does not exist.

load_from_wandb_config(wandb_config: dict)[source]#

Load the parameters from a W&B config dictionary.

Parameters:: wandb_config (dict) – The W&B config dictionary for this agent (e.g. wandb_run.config["agents"][agent_name]).

to_dict() → dict[source]#

Convert the parameters object to a dictionary.

Adds the is_random parameter to the dictionary. This is not a field of the parameters object, but we want to include it in the dictionary for logging.

Returns:: params_dict (dict) – A dictionary of the parameters.

nip.parameters.agents.CodeValidationAgentParameters

Contents

nip.parameters.agents.CodeValidationAgentParameters#