nip.parameters.trainers.RlTrainerParameters#

class nip.parameters.trainers.RlTrainerParameters(frames_per_batch: int | None = 1000, rollouts_per_iteration: int | None = None, steps_per_env_per_iteration: int | None = None, num_iterations: int = 1000, num_epochs: int = 4, minibatch_size: int = 64, lr: float = 0.003, anneal_lr: bool = False, max_grad_norm: float = 1.0, loss_critic_type: str = 'smooth_l1', clip_value: float | bool | None = False, normalize_observations: bool = True, num_normalization_steps: int = 1000, gamma: float = 0.9, lmbda: float = 0.95, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, use_shared_body: bool = True, num_test_iterations: ~typing.Annotated[int, <nip.parameters.base_run.BaseRunPreserve object at 0x7f21e3cf1350>] = 10)[source]#

Additional parameters common to all RL trainers.

Parameters:
  • frames_per_batch (int | None) – The number of frames to sample per training iteration. If None we set the number of frames so that rollouts_per_iteration rollouts are sampled per iteration.

  • rollouts_per_iteration (int | None) – If frames_per_batch is None, we use this parameter to determine the number of rollouts to sample per iteration. frames_per_batch is then set to rollouts_per_iteration * steps_per_env_per_iteration. If None, this defaults to the dataset size, so that every training datapoint appears exactly once in each iteration.

  • steps_per_env_per_iteration (int | None) – Each batch is divided into a number of environments which run trajectories for this many steps. Note that when a trajectory ends, a new one is started immediately. This must be a factor of frames_per_batch, since the number of environments is frames_per_batch / steps_per_env_per_iteration. If None, this defaults to max_message_rounds.

  • num_iterations (int) – The number of sampling and training iterations. num_iterations * frames_per_batch is the total number of frames sampled during training.

  • num_epochs (int) – The number of epochs per training iteration.

  • minibatch_size (int) – The size of the minibatches in each optimization step.

  • lr (float) – The learning rate.

  • anneal_lr (bool) – Whether to (linearly) anneal the learning rate over time. Defaults to False.

  • max_grad_norm (float) – The maximum norm of the gradients during optimization.

  • loss_critic_type (str) – Can be one of “l1”, “l2” or “smooth_l1”. Defaults to "smooth_l1".

  • clip_value (float or bool, optional) – If a float is provided, it will be used to compute a clipped version of the value prediction with respect to the input tensordict value estimate and use it to calculate the value loss. The purpose of clipping is to limit the impact of extreme value predictions, helping stabilize training and preventing large updates. However, it will have no impact if the value estimate was done by the current version of the value estimator. If instead True is provided, the clip_epsilon parameter will be used as the clipping threshold. If not provided or False, no clipping will be performed. Defaults to False.

  • normalize_observations (bool) – Whether to normalise the observations in the environment.

  • num_normalization_steps (int) – The number of steps to use to calculate the mean and standard deviation of the observations for normalisation. The environment is run for this many steps in total with random actions.

  • gamma (float) – The discount factor.

  • lmbda (float) – The GAE lambda parameter.

  • use_shared_body (bool) – Whether the actor and critic share the same body, when using a critic.

  • num_test_iterations (int) – The number of iterations to run the test for. In each iteration we sample frames_per_batch frames, as in training.

Methods Summary

__eq__(other)

Return self==value.

__init__([frames_per_batch, ...])

__post_init__()

__repr__()

Return repr(self).

_get_param_class_from_dict(param_dict)

Try to get the parameter class from a dictionary of serialised parameters.

construct_test_params()

Construct a set of basic parameters for testing.

from_dict(params_dict[, ignore_extra_keys])

Create a parameters object from a dictionary.

get(address)

Get a value from the parameters object using a dot-separated address.

to_dict()

Convert the parameters object to a dictionary.

Attributes

anneal_lr

body_lr_factor

clip_value

frames_per_batch

gamma

lmbda

loss_critic_type

lr

max_grad_norm

minibatch_size

normalize_observations

num_epochs

num_iterations

num_normalization_steps

num_test_iterations

rollouts_per_iteration

steps_per_env_per_iteration

use_shared_body

Methods

__eq__(other)#

Return self==value.

__init__(frames_per_batch: int | None = 1000, rollouts_per_iteration: int | None = None, steps_per_env_per_iteration: int | None = None, num_iterations: int = 1000, num_epochs: int = 4, minibatch_size: int = 64, lr: float = 0.003, anneal_lr: bool = False, max_grad_norm: float = 1.0, loss_critic_type: str = 'smooth_l1', clip_value: float | bool | None = False, normalize_observations: bool = True, num_normalization_steps: int = 1000, gamma: float = 0.9, lmbda: float = 0.95, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, use_shared_body: bool = True, num_test_iterations: ~typing.Annotated[int, <nip.parameters.base_run.BaseRunPreserve object at 0x7f21e3cf1350>] = 10) None#
__post_init__()[source]#
__repr__()#

Return repr(self).

classmethod _get_param_class_from_dict(param_dict: dict) type[ParameterValue] | None[source]#

Try to get the parameter class from a dictionary of serialised parameters.

Parameters:

param_dict (dict) – A dictionary of parameters, which may have come from a to_dict method. This dictionary may contain a _type key, which is used to determine the class of the parameter.

Returns:

param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.

Raises:

ValueError – If the class specified in the dictionary is not a valid parameter class.

classmethod construct_test_params() BaseHyperParameters[source]#

Construct a set of basic parameters for testing.

classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) BaseHyperParameters[source]#

Create a parameters object from a dictionary.

Parameters:
  • params_dict (dict) – A dictionary of the parameters.

  • ignore_extra_keys (bool, default=False) – If True, ignore keys in the dictionary that do not correspond to fields in the parameters object.

Returns:

hyper_params (BaseParameters) – The parameters object.

get(address: str) Any[source]#

Get a value from the parameters object using a dot-separated address.

Parameters:

address (str) – The path to the value in the parameters object, separated by dots.

Returns:

value (Any) – The value at the address.

Raises:

KeyError – If the address does not exist.

to_dict() dict[source]#

Convert the parameters object to a dictionary.

Turns enums into strings, and sub-parameters into dictionaries. Includes the is_random parameter if it exists.

Returns:

params_dict (dict) – A dictionary of the parameters.