nip.parameters.agents.PureTextAgentParameters#

class nip.parameters.agents.PureTextAgentParameters(agent_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, update_schedule: ~nip.parameters.update_schedule.AgentUpdateSchedule = ConstantUpdateSchedule(), use_manual_architecture: bool = False, normalize_message_history: bool = False, load_checkpoint_and_parameters: bool = False, checkpoint_entity: str = <factory>, checkpoint_project: str = <factory>, checkpoint_run_id: str | None = None, checkpoint_version: str = 'latest', use_orthogonal_initialisation: bool = True, orthogonal_initialisation_gain: float = 1.0, model_provider: ~typing.Literal['OpenAI', 'SelfHosted', 'OpenRouter'] = 'OpenAI', model_name: str = 'gpt-4o-mini-2024-07-18', language_model_server_scheme_host: str = 'http://localhost', language_model_server_port: int = 5000, vllm_server_port: int = 8000, use_dummy_api: bool = False, shared_model_group: str | None = None, temperature: float | None = None, top_p: float | None = None, repetition_penalty: float | None = None, fine_tune_from_scratch: bool = True, freeze_agent: bool = False, quantization: ~typing.Literal['bitsandbytes', 'none'] = 'none', num_epochs: int | None = None, dpo_beta: float | None = None, use_lora: bool = True, lora_rank: int = 64, lora_alpha: int | None = None, lora_alpha_scale: float | None = 1.0, lora_dropout: float = 0.05, stack_lora_adapters: bool = False, per_device_train_batch_size: int = 2, system_prompt_template_path: str | None = None, use_supervisor_message: ~typing.Literal['none', 'all', 'first', 'all_but_first'] = 'none', supervisor_name: str = 'Supervisor', max_response_words: int = 150, max_tokens_per_message: int | None = None, num_invalid_generation_retries: int = 20)[source]#

Additional parameters for text-based agents who use APIs to generate responses.

Parameters:
  • model_provider (Literal["OpenAI", "SelfHosted", "OpenRouter"]) – The provider of the model and API to use.

  • model_name (str) – The name of the model to use.

  • language_model_server_scheme_host (str) – The scheme and host of the language model server. If the model provider is “SelfHosted”, this controls the vLLM server and open-weight fine-tuning.

  • language_model_server_port (int) – The port of the language model server. If the model provider is “SelfHosted”, this controls the vLLM server and open-weight fine-tuning.

  • vllm_server_port (int) – The port of the vLLM server. This is used when the model provider is “SelfHosted”. Models are served by vLLM, which uses the language_model_server_scheme_host scheme and host, and this port.

  • use_dummy_api (bool) – Whether to use a dummy API instead of the real API. This is useful for testing the agent without making real API requests.

  • shared_model_group (str | None) – The group of agents which share the same model. When two agents share this value, they will use the same model inference. For fine-tuning, this model is trained on a copy of the rollouts and rewards for each agent in the group. When this is None, the agent is in a group whose name is the same as the agent’s name.

  • temperature (float | None) – The temperature to use when sampling from the model. If None, the model uses the default temperature. Only one of temperature and top_p should be set.

  • top_p (float | None) – The top-p value to use when sampling from the model. A value 0.1 means only the top 10% of tokens are considered when sampling. If None, the model uses the default top-p value. Only one of temperature and top_p should be set.

  • repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens. Not all models support this parameter.

  • fine_tune_from_scratch (bool) – Whether to fine-tune the model from scratch each iteration, or continue fine-tuning from the previous iteration.

  • freeze_agent (bool) – Whether to freeze the agent (i.e. not fine-tune it).

  • quantization (VllmQuantization) – The quantization method to use for model inference. This is only relevant when using a self-hosted model. It controls how the model weights are quantized to reduce memory usage, at the cost of some accuracy.

  • num_epochs (int | None) – The number of epochs to train the model for. If None, we use the global rl.num_epochs parameter, which is set in the RL trainer parameters.

  • dpo_beta (float | None) – The beta parameter for to use when training the model with DPO. This is a float between 0 and 2, which controls how strictly the new model will adhere to its previous behaviour. If None, the value is configured by the model provider.

  • use_lora (bool) – Whether to a LoRA adapter when training the model [YYK+23]. A LoRA adapter adds extra trainable parameters to the model, which are trained separately from the base model. This allows faster training and smaller checkpoints. Only relevant when using a self-hosted model.

  • lora_rank (int) – The rank of the LoRA adapter, controlling the number of trainable parameters. Usually a power of 2 between 4 and 256. This is a key hyperparameter to tune when using LoRA. A higher rank means more capacity to learn new skills, but requires higher quality data and more training time.

  • lora_alpha (int | None) – The scaling factor for the LoRA adapter, for the strength of the adapter. Typically either the same as the LoRA rank, or two times the LoRA rank. If None, the value is computed as lora_rank * lora_alpha_scale. One of this and lora_alpha_scale must be set, but not both.

  • lora_alpha_scale (float | None) – Used to compute the LoRA alpha value. If lora_alpha is not set, this is multiplied by the LoRA rank to compute the LoRA alpha value. One of this and lora_alpha must be set, but not both.

  • lora_dropout (float) – The dropout rate for the LoRA layers. This is applied to the LoRA layers in the model, and is used to prevent overfitting.

  • stack_lora_adapters (bool) – When training a model multiple times with LoRA, whether to stack the LoRA adapters on top of each other, or to reuse the existing LoRA adapter.

  • per_device_train_batch_size (int) – The batch size per device (GPU) for training.

  • system_prompt_template_path (str | None) – This option allows specifying a custom system prompt template. If not provided, the default system prompt template is used.

  • use_supervisor_message (UseSupervisorType) – Whether and when to use a ‘supervisor’ message when generating responses. This is a message which is appended to the chat history, with instructions for the model. These instructions are already included in the system prompt, but this can help improve the quality of the generated responses. Some models also require at least one user message to be able to generate a response, and this can be used to work around that. The options are listed in the UseSupervisorType enum, and specify when to use the supervisor message.

  • supervisor_name (str) – The name of the user who sends the supervisor message.

  • max_response_words (int) – In the system prompt, we say that the agent should respond with a message of at most this many words.

  • max_tokens_per_message (int | None) – The maximum number of tokens which the model is allowed to generate in a single message. If None, this is calculated based on the max_response_words.

  • num_invalid_generation_retries (int) – The number of times to retry generating a message if the model returns an invalid response.

Methods Summary

__eq__(other)

Return self==value.

__init__([agent_lr_factor, body_lr_factor, ...])

__post_init__()

__repr__()

Return repr(self).

_get_param_class_from_dict(param_dict)

Try to get the parameter class from a dictionary of serialised parameters.

construct_test_params()

Construct test parameters for the agent.

from_dict(params_dict[, ignore_extra_keys])

Create a parameters object from a dictionary.

get(address)

Get a value from the parameters object using a dot-separated address.

load_from_wandb_config(wandb_config)

Load the parameters from a W&B config dictionary.

to_dict()

Convert the parameters object to a dictionary.

Attributes

LOAD_PRESERVED_PARAMETERS

agent_lr_factor

body_lr_factor

checkpoint_run_id

checkpoint_version

dpo_beta

fine_tune_from_scratch

freeze_agent

is_random

language_model_server_port

language_model_server_scheme_host

load_checkpoint_and_parameters

lora_alpha

lora_alpha_scale

lora_dropout

lora_rank

max_response_words

max_tokens_per_message

model_name

model_provider

normalize_message_history

num_epochs

num_invalid_generation_retries

orthogonal_initialisation_gain

per_device_train_batch_size

quantization

repetition_penalty

shared_model_group

stack_lora_adapters

supervisor_name

system_prompt_template_path

temperature

top_p

update_schedule

use_dummy_api

use_lora

use_manual_architecture

use_orthogonal_initialisation

use_supervisor_message

vllm_server_port

checkpoint_entity

checkpoint_project

Methods

__eq__(other)#

Return self==value.

__init__(agent_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, update_schedule: ~nip.parameters.update_schedule.AgentUpdateSchedule = ConstantUpdateSchedule(), use_manual_architecture: bool = False, normalize_message_history: bool = False, load_checkpoint_and_parameters: bool = False, checkpoint_entity: str = <factory>, checkpoint_project: str = <factory>, checkpoint_run_id: str | None = None, checkpoint_version: str = 'latest', use_orthogonal_initialisation: bool = True, orthogonal_initialisation_gain: float = 1.0, model_provider: ~typing.Literal['OpenAI', 'SelfHosted', 'OpenRouter'] = 'OpenAI', model_name: str = 'gpt-4o-mini-2024-07-18', language_model_server_scheme_host: str = 'http://localhost', language_model_server_port: int = 5000, vllm_server_port: int = 8000, use_dummy_api: bool = False, shared_model_group: str | None = None, temperature: float | None = None, top_p: float | None = None, repetition_penalty: float | None = None, fine_tune_from_scratch: bool = True, freeze_agent: bool = False, quantization: ~typing.Literal['bitsandbytes', 'none'] = 'none', num_epochs: int | None = None, dpo_beta: float | None = None, use_lora: bool = True, lora_rank: int = 64, lora_alpha: int | None = None, lora_alpha_scale: float | None = 1.0, lora_dropout: float = 0.05, stack_lora_adapters: bool = False, per_device_train_batch_size: int = 2, system_prompt_template_path: str | None = None, use_supervisor_message: ~typing.Literal['none', 'all', 'first', 'all_but_first'] = 'none', supervisor_name: str = 'Supervisor', max_response_words: int = 150, max_tokens_per_message: int | None = None, num_invalid_generation_retries: int = 20) None#
__post_init__()[source]#
__repr__()#

Return repr(self).

classmethod _get_param_class_from_dict(param_dict: dict) type[ParameterValue] | None[source]#

Try to get the parameter class from a dictionary of serialised parameters.

Parameters:

param_dict (dict) – A dictionary of parameters, which may have come from a to_dict method. This dictionary may contain a _type key, which is used to determine the class of the parameter.

Returns:

param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.

Raises:

ValueError – If the class specified in the dictionary is not a valid parameter class.

classmethod construct_test_params() PureTextAgentParameters[source]#

Construct test parameters for the agent.

For this agent, we use the dummy API, so that we don’t need to make real API requests.

Returns:

test_params (PureTextAgentParameters) – The test parameters.

classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) AgentsParameters[source]#

Create a parameters object from a dictionary.

Parameters:
  • params_dict (dict) – A dictionary of the parameters.

  • ignore_extra_keys (bool, default=False) – If True, ignore keys in the dictionary that do not correspond to fields in the parameters object.

Returns:

hyper_params (AgentsParameters) – The parameters object.

get(address: str) Any[source]#

Get a value from the parameters object using a dot-separated address.

Parameters:

address (str) – The path to the value in the parameters object, separated by dots.

Returns:

value (Any) – The value at the address.

Raises:

KeyError – If the address does not exist.

load_from_wandb_config(wandb_config: dict)[source]#

Load the parameters from a W&B config dictionary.

Parameters:

wandb_config (dict) – The W&B config dictionary for this agent (e.g. wandb_run.config["agents"][agent_name]).

to_dict() dict[source]#

Convert the parameters object to a dictionary.

Adds the is_random parameter to the dictionary. This is not a field of the parameters object, but we want to include it in the dictionary for logging.

Returns:

params_dict (dict) – A dictionary of the parameters.