- class nip.parameters.trainers.RlTrainerParameters(frames_per_batch: int | None = 1000, rollouts_per_iteration: int | None = None, steps_per_env_per_iteration: int | None = None, num_iterations: int = 1000, num_epochs: int = 4, minibatch_size: int = 64, lr: float = 0.003, anneal_lr: bool = False, max_grad_norm: float = 1.0, loss_critic_type: str = 'smooth_l1', clip_value: float | bool | None = False, normalize_observations: bool = True, num_normalization_steps: int = 1000, gamma: float = 0.9, lmbda: float = 0.95, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, use_shared_body: bool = True, num_test_iterations: ~typing.Annotated[int, <nip.parameters.base_run.BaseRunPreserve object at 0x7f21e3cf1350>] = 10)[source]#
Additional parameters common to all RL trainers.
- Parameters:
frames_per_batch (int | None) – The number of frames to sample per training iteration. If
we set the number of frames so thatrollouts_per_iteration
rollouts are sampled per iteration.rollouts_per_iteration (int | None) – If
, we use this parameter to determine the number of rollouts to sample per iteration.frames_per_batch
is then set torollouts_per_iteration * steps_per_env_per_iteration
. IfNone
, this defaults to the dataset size, so that every training datapoint appears exactly once in each iteration.steps_per_env_per_iteration (int | None) – Each batch is divided into a number of environments which run trajectories for this many steps. Note that when a trajectory ends, a new one is started immediately. This must be a factor of
, since the number of environments isframes_per_batch / steps_per_env_per_iteration
. IfNone
, this defaults tomax_message_rounds
.num_iterations (int) – The number of sampling and training iterations. num_iterations * frames_per_batch is the total number of frames sampled during training.
num_epochs (int) – The number of epochs per training iteration.
minibatch_size (int) – The size of the minibatches in each optimization step.
lr (float) – The learning rate.
anneal_lr (bool) – Whether to (linearly) anneal the learning rate over time. Defaults to
.max_grad_norm (float) – The maximum norm of the gradients during optimization.
loss_critic_type (str) – Can be one of “l1”, “l2” or “smooth_l1”. Defaults to
.clip_value (float or bool, optional) – If a
is provided, it will be used to compute a clipped version of the value prediction with respect to the input tensordict value estimate and use it to calculate the value loss. The purpose of clipping is to limit the impact of extreme value predictions, helping stabilize training and preventing large updates. However, it will have no impact if the value estimate was done by the current version of the value estimator. If insteadTrue
is provided, theclip_epsilon
parameter will be used as the clipping threshold. If not provided orFalse
, no clipping will be performed. Defaults toFalse
.normalize_observations (bool) – Whether to normalise the observations in the environment.
num_normalization_steps (int) – The number of steps to use to calculate the mean and standard deviation of the observations for normalisation. The environment is run for this many steps in total with random actions.
gamma (float) – The discount factor.
lmbda (float) – The GAE lambda parameter.
use_shared_body (bool) – Whether the actor and critic share the same body, when using a critic.
num_test_iterations (int) – The number of iterations to run the test for. In each iteration we sample
frames, as in training.
Methods Summary
(other)Return self==value.
([frames_per_batch, ...])__repr__
()Return repr(self).
(param_dict)Try to get the parameter class from a dictionary of serialised parameters.
Construct a set of basic parameters for testing.
(params_dict[, ignore_extra_keys])Create a parameters object from a dictionary.
(address)Get a value from the parameters object using a dot-separated address.
()Convert the parameters object to a dictionary.
- __eq__(other)#
Return self==value.
- __init__(frames_per_batch: int | None = 1000, rollouts_per_iteration: int | None = None, steps_per_env_per_iteration: int | None = None, num_iterations: int = 1000, num_epochs: int = 4, minibatch_size: int = 64, lr: float = 0.003, anneal_lr: bool = False, max_grad_norm: float = 1.0, loss_critic_type: str = 'smooth_l1', clip_value: float | bool | None = False, normalize_observations: bool = True, num_normalization_steps: int = 1000, gamma: float = 0.9, lmbda: float = 0.95, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, use_shared_body: bool = True, num_test_iterations: ~typing.Annotated[int, <nip.parameters.base_run.BaseRunPreserve object at 0x7f21e3cf1350>] = 10) None #
- __repr__()#
Return repr(self).
- classmethod _get_param_class_from_dict(param_dict: dict) type[ParameterValue] | None [source]#
Try to get the parameter class from a dictionary of serialised parameters.
- Parameters:
param_dict (dict) – A dictionary of parameters, which may have come from a
method. This dictionary may contain a_type
key, which is used to determine the class of the parameter.- Returns:
param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.
- Raises:
ValueError – If the class specified in the dictionary is not a valid parameter class.
- classmethod construct_test_params() BaseHyperParameters [source]#
Construct a set of basic parameters for testing.
- classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) BaseHyperParameters [source]#
Create a parameters object from a dictionary.