nip.parameters.trainers.PureTextMaltParameters#
- class nip.parameters.trainers.PureTextMaltParameters(num_responses_per_timestep: int = 2, frozen_agents_generate_one_response: bool = True, pair_selection_method: Literal['positive_negative', 'interval'] = 'positive_negative', interval_threshold_proportion: float = 0.1, num_initial_ei_iterations: int = 0)[source]#
Additional parameters for Multi-Agent LLM Training (MALT) [MSD+24].
- Parameters:
num_responses_per_timestep (int) – The number of responses to sample from the agents at each timestep. This yields a tree of size at most
num_responses_per_timestep ** max_message_rounds
.frozen_agents_generate_one_response (bool) – If
False
, when generating the tree of rollouts, we sample multiple responses from the frozen agents. Since frozen agents are not trained, it is not really necessary to sample multiple responses from them. However, having multiple enriches the tree, meaning the preference pairs are likely of higher quality. Currently this option does not apply if there is a round where both a frozen and a non-frozen agent take actions. In this case, the frozen agent will automatically sample multiple responses.pair_selection_method (Literal["positive_negative", "interval"]) –
The method to use for selecting the pairs of responses for DPO training. Possible values are:
”positive_negative”: Selects a response where the agent’s expected reward is above a certain threshold (by default the reward mid-point) and a response where the agent’s expected reward is below this threshold.
”interval”: Selects a pair of responses where the difference in expected reward is above a certain threshold. This threshold is computed as
interval_threshold_proportion
times the difference between the maximum and minimum possible reward for the agent.
interval_threshold_proportion (float) – When
pair_selection_method
is “interval”, this value is used to compute the threshold for the difference in expected reward. The threshold is computed asinterval_threshold_proportion
times the difference between the maximum and minimum possible reward for the agent.num_initial_ei_iterations (int) – The number of iterations to run the EI trainer for before switching to MALT. This is used to warm-start the training process. These iterations count towards the total number of iterations, so the number of iterations for MALT is
num_iterations - num_initial_ei_iterations
.
Methods Summary
__eq__
(other)Return self==value.
__init__
([num_responses_per_timestep, ...])__repr__
()Return repr(self).
_get_param_class_from_dict
(param_dict)Try to get the parameter class from a dictionary of serialised parameters.
Construct a set of basic parameters for testing.
from_dict
(params_dict[, ignore_extra_keys])Create a parameters object from a dictionary.
get
(address)Get a value from the parameters object using a dot-separated address.
to_dict
()Convert the parameters object to a dictionary.
Attributes
frozen_agents_generate_one_response
interval_threshold_proportion
num_initial_ei_iterations
num_responses_per_timestep
pair_selection_method
Methods
- __eq__(other)#
Return self==value.
- __init__(num_responses_per_timestep: int = 2, frozen_agents_generate_one_response: bool = True, pair_selection_method: Literal['positive_negative', 'interval'] = 'positive_negative', interval_threshold_proportion: float = 0.1, num_initial_ei_iterations: int = 0) None #
- __repr__()#
Return repr(self).
- classmethod _get_param_class_from_dict(param_dict: dict) type[ParameterValue] | None [source]#
Try to get the parameter class from a dictionary of serialised parameters.
- Parameters:
param_dict (dict) – A dictionary of parameters, which may have come from a
to_dict
method. This dictionary may contain a_type
key, which is used to determine the class of the parameter.- Returns:
param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.
- Raises:
ValueError – If the class specified in the dictionary is not a valid parameter class.
- classmethod construct_test_params() BaseHyperParameters [source]#
Construct a set of basic parameters for testing.
- classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) BaseHyperParameters [source]#
Create a parameters object from a dictionary.