nip.parameters.trainers.PureTextMaltParameters#

class nip.parameters.trainers.PureTextMaltParameters(num_responses_per_timestep: int = 2, frozen_agents_generate_one_response: bool = True, pair_selection_method: Literal['positive_negative', 'interval'] = 'positive_negative', interval_threshold_proportion: float = 0.1, num_initial_ei_iterations: int = 0)[source]#

Additional parameters for Multi-Agent LLM Training (MALT) [MSD+24].

Parameters:
  • num_responses_per_timestep (int) – The number of responses to sample from the agents at each timestep. This yields a tree of size at most num_responses_per_timestep ** max_message_rounds.

  • frozen_agents_generate_one_response (bool) – If False, when generating the tree of rollouts, we sample multiple responses from the frozen agents. Since frozen agents are not trained, it is not really necessary to sample multiple responses from them. However, having multiple enriches the tree, meaning the preference pairs are likely of higher quality. Currently this option does not apply if there is a round where both a frozen and a non-frozen agent take actions. In this case, the frozen agent will automatically sample multiple responses.

  • pair_selection_method (Literal["positive_negative", "interval"]) –

    The method to use for selecting the pairs of responses for DPO training. Possible values are:

    • ”positive_negative”: Selects a response where the agent’s expected reward is above a certain threshold (by default the reward mid-point) and a response where the agent’s expected reward is below this threshold.

    • ”interval”: Selects a pair of responses where the difference in expected reward is above a certain threshold. This threshold is computed as interval_threshold_proportion times the difference between the maximum and minimum possible reward for the agent.

  • interval_threshold_proportion (float) – When pair_selection_method is “interval”, this value is used to compute the threshold for the difference in expected reward. The threshold is computed as interval_threshold_proportion times the difference between the maximum and minimum possible reward for the agent.

  • num_initial_ei_iterations (int) – The number of iterations to run the EI trainer for before switching to MALT. This is used to warm-start the training process. These iterations count towards the total number of iterations, so the number of iterations for MALT is num_iterations - num_initial_ei_iterations.

Methods Summary

__eq__(other)

Return self==value.

__init__([num_responses_per_timestep, ...])

__post_init__()

__repr__()

Return repr(self).

_get_param_class_from_dict(param_dict)

Try to get the parameter class from a dictionary of serialised parameters.

construct_test_params()

Construct a set of basic parameters for testing.

from_dict(params_dict[, ignore_extra_keys])

Create a parameters object from a dictionary.

get(address)

Get a value from the parameters object using a dot-separated address.

to_dict()

Convert the parameters object to a dictionary.

Attributes

frozen_agents_generate_one_response

interval_threshold_proportion

num_initial_ei_iterations

num_responses_per_timestep

pair_selection_method

Methods

__eq__(other)#

Return self==value.

__init__(num_responses_per_timestep: int = 2, frozen_agents_generate_one_response: bool = True, pair_selection_method: Literal['positive_negative', 'interval'] = 'positive_negative', interval_threshold_proportion: float = 0.1, num_initial_ei_iterations: int = 0) None#
__post_init__()[source]#
__repr__()#

Return repr(self).

classmethod _get_param_class_from_dict(param_dict: dict) type[ParameterValue] | None[source]#

Try to get the parameter class from a dictionary of serialised parameters.

Parameters:

param_dict (dict) – A dictionary of parameters, which may have come from a to_dict method. This dictionary may contain a _type key, which is used to determine the class of the parameter.

Returns:

param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.

Raises:

ValueError – If the class specified in the dictionary is not a valid parameter class.

classmethod construct_test_params() BaseHyperParameters[source]#

Construct a set of basic parameters for testing.

classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) BaseHyperParameters[source]#

Create a parameters object from a dictionary.

Parameters:
  • params_dict (dict) – A dictionary of the parameters.

  • ignore_extra_keys (bool, default=False) – If True, ignore keys in the dictionary that do not correspond to fields in the parameters object.

Returns:

hyper_params (BaseParameters) – The parameters object.

get(address: str) Any[source]#

Get a value from the parameters object using a dot-separated address.

Parameters:

address (str) – The path to the value in the parameters object, separated by dots.

Returns:

value (Any) – The value at the address.

Raises:

KeyError – If the address does not exist.

to_dict() dict[source]#

Convert the parameters object to a dictionary.

Turns enums into strings, and sub-parameters into dictionaries. Includes the is_random parameter if it exists.

Returns:

params_dict (dict) – A dictionary of the parameters.