nip.parameters.agents.PureTextAgentParameters#
- class nip.parameters.agents.PureTextAgentParameters(agent_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, update_schedule: ~nip.parameters.update_schedule.AgentUpdateSchedule = ConstantUpdateSchedule(), use_manual_architecture: bool = False, normalize_message_history: bool = False, load_checkpoint_and_parameters: bool = False, checkpoint_entity: str = <factory>, checkpoint_project: str = <factory>, checkpoint_run_id: str | None = None, checkpoint_version: str = 'latest', use_orthogonal_initialisation: bool = True, orthogonal_initialisation_gain: float = 1.0, model_provider: ~typing.Literal['OpenAI', 'SelfHosted', 'OpenRouter'] = 'OpenAI', model_name: str = 'gpt-4o-mini-2024-07-18', language_model_server_scheme_host: str = 'http://localhost', language_model_server_port: int = 5000, vllm_server_port: int = 8000, use_dummy_api: bool = False, shared_model_group: str | None = None, temperature: float | None = None, top_p: float | None = None, repetition_penalty: float | None = None, fine_tune_from_scratch: bool = True, freeze_agent: bool = False, quantization: ~typing.Literal['bitsandbytes', 'none'] = 'none', num_epochs: int | None = None, dpo_beta: float | None = None, use_lora: bool = True, lora_rank: int = 64, lora_alpha: int | None = None, lora_alpha_scale: float | None = 1.0, lora_dropout: float = 0.05, stack_lora_adapters: bool = False, per_device_train_batch_size: int = 2, system_prompt_template_path: str | None = None, use_supervisor_message: ~typing.Literal['none', 'all', 'first', 'all_but_first'] = 'none', supervisor_name: str = 'Supervisor', max_response_words: int = 150, max_tokens_per_message: int | None = None, num_invalid_generation_retries: int = 20)[source]#
Additional parameters for text-based agents who use APIs to generate responses.
- Parameters:
model_provider (Literal["OpenAI", "SelfHosted", "OpenRouter"]) – The provider of the model and API to use.
model_name (str) – The name of the model to use.
language_model_server_scheme_host (str) – The scheme and host of the language model server. If the model provider is “SelfHosted”, this controls the vLLM server and open-weight fine-tuning.
language_model_server_port (int) – The port of the language model server. If the model provider is “SelfHosted”, this controls the vLLM server and open-weight fine-tuning.
vllm_server_port (int) – The port of the vLLM server. This is used when the model provider is “SelfHosted”. Models are served by vLLM, which uses the
language_model_server_scheme_host
scheme and host, and this port.use_dummy_api (bool) – Whether to use a dummy API instead of the real API. This is useful for testing the agent without making real API requests.
shared_model_group (str | None) – The group of agents which share the same model. When two agents share this value, they will use the same model inference. For fine-tuning, this model is trained on a copy of the rollouts and rewards for each agent in the group. When this is
None
, the agent is in a group whose name is the same as the agent’s name.temperature (float | None) – The temperature to use when sampling from the model. If
None
, the model uses the default temperature. Only one oftemperature
andtop_p
should be set.top_p (float | None) – The top-p value to use when sampling from the model. A value 0.1 means only the top 10% of tokens are considered when sampling. If
None
, the model uses the default top-p value. Only one oftemperature
andtop_p
should be set.repetition_penalty (float | None) – Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens. Not all models support this parameter.
fine_tune_from_scratch (bool) – Whether to fine-tune the model from scratch each iteration, or continue fine-tuning from the previous iteration.
freeze_agent (bool) – Whether to freeze the agent (i.e. not fine-tune it).
quantization (VllmQuantization) – The quantization method to use for model inference. This is only relevant when using a self-hosted model. It controls how the model weights are quantized to reduce memory usage, at the cost of some accuracy.
num_epochs (int | None) – The number of epochs to train the model for. If
None
, we use the globalrl.num_epochs
parameter, which is set in the RL trainer parameters.dpo_beta (float | None) – The beta parameter for to use when training the model with DPO. This is a float between 0 and 2, which controls how strictly the new model will adhere to its previous behaviour. If
None
, the value is configured by the model provider.use_lora (bool) – Whether to a LoRA adapter when training the model [YYK+23]. A LoRA adapter adds extra trainable parameters to the model, which are trained separately from the base model. This allows faster training and smaller checkpoints. Only relevant when using a self-hosted model.
lora_rank (int) – The rank of the LoRA adapter, controlling the number of trainable parameters. Usually a power of 2 between 4 and 256. This is a key hyperparameter to tune when using LoRA. A higher rank means more capacity to learn new skills, but requires higher quality data and more training time.
lora_alpha (int | None) – The scaling factor for the LoRA adapter, for the strength of the adapter. Typically either the same as the LoRA rank, or two times the LoRA rank. If
None
, the value is computed aslora_rank * lora_alpha_scale
. One of this andlora_alpha_scale
must be set, but not both.lora_alpha_scale (float | None) – Used to compute the LoRA alpha value. If
lora_alpha
is not set, this is multiplied by the LoRA rank to compute the LoRA alpha value. One of this andlora_alpha
must be set, but not both.lora_dropout (float) – The dropout rate for the LoRA layers. This is applied to the LoRA layers in the model, and is used to prevent overfitting.
stack_lora_adapters (bool) – When training a model multiple times with LoRA, whether to stack the LoRA adapters on top of each other, or to reuse the existing LoRA adapter.
per_device_train_batch_size (int) – The batch size per device (GPU) for training.
system_prompt_template_path (str | None) – This option allows specifying a custom system prompt template. If not provided, the default system prompt template is used.
use_supervisor_message (UseSupervisorType) – Whether and when to use a ‘supervisor’ message when generating responses. This is a message which is appended to the chat history, with instructions for the model. These instructions are already included in the system prompt, but this can help improve the quality of the generated responses. Some models also require at least one user message to be able to generate a response, and this can be used to work around that. The options are listed in the
UseSupervisorType
enum, and specify when to use the supervisor message.supervisor_name (str) – The name of the user who sends the supervisor message.
max_response_words (int) – In the system prompt, we say that the agent should respond with a message of at most this many words.
max_tokens_per_message (int | None) – The maximum number of tokens which the model is allowed to generate in a single message. If
None
, this is calculated based on themax_response_words
.num_invalid_generation_retries (int) – The number of times to retry generating a message if the model returns an invalid response.
Methods Summary
__eq__
(other)Return self==value.
__init__
([agent_lr_factor, body_lr_factor, ...])__repr__
()Return repr(self).
_get_param_class_from_dict
(param_dict)Try to get the parameter class from a dictionary of serialised parameters.
Construct test parameters for the agent.
from_dict
(params_dict[, ignore_extra_keys])Create a parameters object from a dictionary.
get
(address)Get a value from the parameters object using a dot-separated address.
load_from_wandb_config
(wandb_config)Load the parameters from a W&B config dictionary.
to_dict
()Convert the parameters object to a dictionary.
Attributes
LOAD_PRESERVED_PARAMETERS
agent_lr_factor
body_lr_factor
checkpoint_run_id
checkpoint_version
dpo_beta
fine_tune_from_scratch
freeze_agent
is_random
language_model_server_port
language_model_server_scheme_host
load_checkpoint_and_parameters
lora_alpha
lora_alpha_scale
lora_dropout
lora_rank
max_response_words
max_tokens_per_message
model_name
model_provider
normalize_message_history
num_epochs
num_invalid_generation_retries
orthogonal_initialisation_gain
per_device_train_batch_size
quantization
repetition_penalty
shared_model_group
stack_lora_adapters
supervisor_name
system_prompt_template_path
temperature
top_p
update_schedule
use_dummy_api
use_lora
use_manual_architecture
use_orthogonal_initialisation
use_supervisor_message
vllm_server_port
checkpoint_entity
checkpoint_project
Methods
- __eq__(other)#
Return self==value.
- __init__(agent_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, update_schedule: ~nip.parameters.update_schedule.AgentUpdateSchedule = ConstantUpdateSchedule(), use_manual_architecture: bool = False, normalize_message_history: bool = False, load_checkpoint_and_parameters: bool = False, checkpoint_entity: str = <factory>, checkpoint_project: str = <factory>, checkpoint_run_id: str | None = None, checkpoint_version: str = 'latest', use_orthogonal_initialisation: bool = True, orthogonal_initialisation_gain: float = 1.0, model_provider: ~typing.Literal['OpenAI', 'SelfHosted', 'OpenRouter'] = 'OpenAI', model_name: str = 'gpt-4o-mini-2024-07-18', language_model_server_scheme_host: str = 'http://localhost', language_model_server_port: int = 5000, vllm_server_port: int = 8000, use_dummy_api: bool = False, shared_model_group: str | None = None, temperature: float | None = None, top_p: float | None = None, repetition_penalty: float | None = None, fine_tune_from_scratch: bool = True, freeze_agent: bool = False, quantization: ~typing.Literal['bitsandbytes', 'none'] = 'none', num_epochs: int | None = None, dpo_beta: float | None = None, use_lora: bool = True, lora_rank: int = 64, lora_alpha: int | None = None, lora_alpha_scale: float | None = 1.0, lora_dropout: float = 0.05, stack_lora_adapters: bool = False, per_device_train_batch_size: int = 2, system_prompt_template_path: str | None = None, use_supervisor_message: ~typing.Literal['none', 'all', 'first', 'all_but_first'] = 'none', supervisor_name: str = 'Supervisor', max_response_words: int = 150, max_tokens_per_message: int | None = None, num_invalid_generation_retries: int = 20) None #
- __repr__()#
Return repr(self).
- classmethod _get_param_class_from_dict(param_dict: dict) type[ParameterValue] | None [source]#
Try to get the parameter class from a dictionary of serialised parameters.
- Parameters:
param_dict (dict) – A dictionary of parameters, which may have come from a
to_dict
method. This dictionary may contain a_type
key, which is used to determine the class of the parameter.- Returns:
param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.
- Raises:
ValueError – If the class specified in the dictionary is not a valid parameter class.
- classmethod construct_test_params() PureTextAgentParameters [source]#
Construct test parameters for the agent.
For this agent, we use the dummy API, so that we don’t need to make real API requests.
- Returns:
test_params (PureTextAgentParameters) – The test parameters.
- classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) AgentsParameters [source]#
Create a parameters object from a dictionary.
- get(address: str) Any [source]#
Get a value from the parameters object using a dot-separated address.