nip.parameters.agents.CodeValidationAgentParameters#
- class nip.parameters.agents.CodeValidationAgentParameters(agent_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, update_schedule: ~nip.parameters.update_schedule.AgentUpdateSchedule = ConstantUpdateSchedule(), use_manual_architecture: bool = False, normalize_message_history: bool = False, load_checkpoint_and_parameters: bool = False, checkpoint_entity: str = <factory>, checkpoint_project: str = <factory>, checkpoint_run_id: str | None = None, checkpoint_version: str = 'latest', use_orthogonal_initialisation: bool = True, orthogonal_initialisation_gain: float = 1.0, model_provider: ~typing.Literal['OpenAI', 'SelfHosted', 'OpenRouter'] = 'OpenAI', model_name: str = 'gpt-4o-mini-2024-07-18', language_model_server_scheme_host: str = 'http://localhost', language_model_server_port: int = 5000, vllm_server_port: int = 8000, use_dummy_api: bool = False, shared_model_group: str | None = None, temperature: float | None = None, top_p: float | None = None, repetition_penalty: float | None = None, fine_tune_from_scratch: bool = True, freeze_agent: bool = False, quantization: ~typing.Literal['bitsandbytes', 'none'] = 'none', num_epochs: int | None = None, dpo_beta: float | None = None, use_lora: bool = True, lora_rank: int = 64, lora_alpha: int | None = None, lora_alpha_scale: float | None = 1.0, lora_dropout: float = 0.05, stack_lora_adapters: bool = False, per_device_train_batch_size: int = 2, system_prompt_template_path: str | None = None, use_supervisor_message: ~typing.Literal['none', 'all', 'first', 'all_but_first'] = 'none', supervisor_name: str = 'Supervisor', max_response_words: int = 150, max_tokens_per_message: int | None = None, num_invalid_generation_retries: int = 20)[source]#
Additional parameters for agents in the code validation experiment.
- Parameters:
model_provider (Literal["OpenAI", "SelfHosted", "OpenRouter"]) – The provider of the model and API to use.
model_name (str) – The name of the model to use.
vllm_openai_base_url (str) – When using vLLM’s OpenAI-compatible server, this is the URL of the server
use_dummy_api (bool) – Whether to use a dummy API instead of the real API. This is useful for testing the agent without making real API requests.
shared_model_group (str | None) – The group of agents which share the same model. When two agents share this value, they will use the same model inference. For fine-tuning, this model is trained on a copy of the rollouts and rewards for each agent in the group. When this is
None, the agent is in a group whose name is the same as the agent’s name.temperature (float | None) – The temperature to use when sampling from the model. If
None, the model uses the default temperature. Only one oftemperatureandtop_pshould be set.top_p (float | None) – The top-p value to use when sampling from the model. A value 0.1 means only the top 10% of tokens are considered when sampling. If
None, the model uses the default top-p value. Only one oftemperatureandtop_pshould be set.repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens. Not all models support this parameter.
fine_tune_from_scratch (bool) – Whether to fine-tune the model from scratch each iteration, or continue fine-tuning from the previous iteration.
freeze_agent (bool) – Whether to freeze the agent (i.e. not fine-tune it).
dpo_beta (float | None) – The beta parameter for to use when training the model with DPO. This is a float between 0 and 2, which controls how strictly the new model will adhere to its previous behaviour. If
None, the value is configured by the model provider.system_prompt_template_path (str | None) – This option allows specifying a custom system prompt template. If not provided, the default system prompt template is used.
use_supervisor_message (UseSupervisorType) – Whether and when to use a ‘supervisor’ message when generating responses. This is a message which is appended to the chat history, with instructions for the model. These instructions are already included in the system prompt, but this can help improve the quality of the generated responses. Some models also require at least one user message to be able to generate a response, and this can be used to work around that. The options are listed in the
UseSupervisorTypeenum, and specify when to use the supervisor message.supervisor_name (str) – The name of the user who sends the supervisor message.
max_response_words (int) – In the system prompt, we say that the agent should respond with a message of at most this many words.
max_tokens_per_message (int | None) – The maximum number of tokens which the model is allowed to generate in a single message. If
None, this is calculated based on themax_response_words.num_invalid_generation_retries (int) – The number of times to retry generating a message if the model returns an invalid response.
Methods Summary
__eq__(other)Return self==value.
__init__([agent_lr_factor, body_lr_factor, ...])__repr__()Return repr(self).
_get_param_class_from_dict(param_dict)Try to get the parameter class from a dictionary of serialised parameters.
Construct test parameters for the agent.
from_dict(params_dict[, ignore_extra_keys])Create a parameters object from a dictionary.
get(address)Get a value from the parameters object using a dot-separated address.
load_from_wandb_config(wandb_config)Load the parameters from a W&B config dictionary.
to_dict()Convert the parameters object to a dictionary.
Attributes
LOAD_PRESERVED_PARAMETERSagent_lr_factorbody_lr_factorcheckpoint_run_idcheckpoint_versiondpo_betafine_tune_from_scratchfreeze_agentis_randomlanguage_model_server_portlanguage_model_server_scheme_hostload_checkpoint_and_parameterslora_alphalora_alpha_scalelora_dropoutlora_rankmax_response_wordsmax_tokens_per_messagemodel_namemodel_providernormalize_message_historynum_epochsnum_invalid_generation_retriesorthogonal_initialisation_gainper_device_train_batch_sizequantizationrepetition_penaltyshared_model_groupstack_lora_adapterssupervisor_namesystem_prompt_template_pathtemperaturetop_pupdate_scheduleuse_dummy_apiuse_lorause_manual_architectureuse_orthogonal_initialisationuse_supervisor_messagevllm_server_portcheckpoint_entitycheckpoint_projectMethods
- __eq__(other)#
Return self==value.
- __init__(agent_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, update_schedule: ~nip.parameters.update_schedule.AgentUpdateSchedule = ConstantUpdateSchedule(), use_manual_architecture: bool = False, normalize_message_history: bool = False, load_checkpoint_and_parameters: bool = False, checkpoint_entity: str = <factory>, checkpoint_project: str = <factory>, checkpoint_run_id: str | None = None, checkpoint_version: str = 'latest', use_orthogonal_initialisation: bool = True, orthogonal_initialisation_gain: float = 1.0, model_provider: ~typing.Literal['OpenAI', 'SelfHosted', 'OpenRouter'] = 'OpenAI', model_name: str = 'gpt-4o-mini-2024-07-18', language_model_server_scheme_host: str = 'http://localhost', language_model_server_port: int = 5000, vllm_server_port: int = 8000, use_dummy_api: bool = False, shared_model_group: str | None = None, temperature: float | None = None, top_p: float | None = None, repetition_penalty: float | None = None, fine_tune_from_scratch: bool = True, freeze_agent: bool = False, quantization: ~typing.Literal['bitsandbytes', 'none'] = 'none', num_epochs: int | None = None, dpo_beta: float | None = None, use_lora: bool = True, lora_rank: int = 64, lora_alpha: int | None = None, lora_alpha_scale: float | None = 1.0, lora_dropout: float = 0.05, stack_lora_adapters: bool = False, per_device_train_batch_size: int = 2, system_prompt_template_path: str | None = None, use_supervisor_message: ~typing.Literal['none', 'all', 'first', 'all_but_first'] = 'none', supervisor_name: str = 'Supervisor', max_response_words: int = 150, max_tokens_per_message: int | None = None, num_invalid_generation_retries: int = 20) None#
- __repr__()#
Return repr(self).
- classmethod _get_param_class_from_dict(param_dict: dict) type[ParameterValue] | None[source]#
Try to get the parameter class from a dictionary of serialised parameters.
- Parameters:
param_dict (dict) – A dictionary of parameters, which may have come from a
to_dictmethod. This dictionary may contain a_typekey, which is used to determine the class of the parameter.- Returns:
param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.
- Raises:
ValueError – If the class specified in the dictionary is not a valid parameter class.
- classmethod construct_test_params() PureTextAgentParameters[source]#
Construct test parameters for the agent.
For this agent, we use the dummy API, so that we don’t need to make real API requests.
- Returns:
test_params (PureTextAgentParameters) – The test parameters.
- classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) AgentsParameters[source]#
Create a parameters object from a dictionary.
- get(address: str) Any[source]#
Get a value from the parameters object using a dot-separated address.