nip.parameters.trainers.RlTrainerParameters#
- class nip.parameters.trainers.RlTrainerParameters(frames_per_batch: int | None = 1000, rollouts_per_iteration: int | None = None, steps_per_env_per_iteration: int | None = None, num_iterations: int = 1000, num_epochs: int = 4, minibatch_size: int = 64, lr: float = 0.003, anneal_lr: bool = False, max_grad_norm: float = 1.0, loss_critic_type: str = 'smooth_l1', clip_value: float | bool | None = False, normalize_observations: bool = True, num_normalization_steps: int = 1000, gamma: float = 0.9, lmbda: float = 0.95, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, use_shared_body: bool = True, num_test_iterations: ~typing.Annotated[int, <nip.parameters.base_run.BaseRunPreserve object at 0x7f21e3cf1350>] = 10)[source]#
Additional parameters common to all RL trainers.
- Parameters:
frames_per_batch (int | None) – The number of frames to sample per training iteration. If
None
we set the number of frames so thatrollouts_per_iteration
rollouts are sampled per iteration.rollouts_per_iteration (int | None) – If
frames_per_batch
isNone
, we use this parameter to determine the number of rollouts to sample per iteration.frames_per_batch
is then set torollouts_per_iteration * steps_per_env_per_iteration
. IfNone
, this defaults to the dataset size, so that every training datapoint appears exactly once in each iteration.steps_per_env_per_iteration (int | None) – Each batch is divided into a number of environments which run trajectories for this many steps. Note that when a trajectory ends, a new one is started immediately. This must be a factor of
frames_per_batch
, since the number of environments isframes_per_batch / steps_per_env_per_iteration
. IfNone
, this defaults tomax_message_rounds
.num_iterations (int) – The number of sampling and training iterations. num_iterations * frames_per_batch is the total number of frames sampled during training.
num_epochs (int) – The number of epochs per training iteration.
minibatch_size (int) – The size of the minibatches in each optimization step.
lr (float) – The learning rate.
anneal_lr (bool) – Whether to (linearly) anneal the learning rate over time. Defaults to
False
.max_grad_norm (float) – The maximum norm of the gradients during optimization.
loss_critic_type (str) – Can be one of “l1”, “l2” or “smooth_l1”. Defaults to
"smooth_l1"
.clip_value (float or bool, optional) – If a
float
is provided, it will be used to compute a clipped version of the value prediction with respect to the input tensordict value estimate and use it to calculate the value loss. The purpose of clipping is to limit the impact of extreme value predictions, helping stabilize training and preventing large updates. However, it will have no impact if the value estimate was done by the current version of the value estimator. If insteadTrue
is provided, theclip_epsilon
parameter will be used as the clipping threshold. If not provided orFalse
, no clipping will be performed. Defaults toFalse
.normalize_observations (bool) – Whether to normalise the observations in the environment.
num_normalization_steps (int) – The number of steps to use to calculate the mean and standard deviation of the observations for normalisation. The environment is run for this many steps in total with random actions.
gamma (float) – The discount factor.
lmbda (float) – The GAE lambda parameter.
use_shared_body (bool) – Whether the actor and critic share the same body, when using a critic.
num_test_iterations (int) – The number of iterations to run the test for. In each iteration we sample
frames_per_batch
frames, as in training.
Methods Summary
__eq__
(other)Return self==value.
__init__
([frames_per_batch, ...])__repr__
()Return repr(self).
_get_param_class_from_dict
(param_dict)Try to get the parameter class from a dictionary of serialised parameters.
Construct a set of basic parameters for testing.
from_dict
(params_dict[, ignore_extra_keys])Create a parameters object from a dictionary.
get
(address)Get a value from the parameters object using a dot-separated address.
to_dict
()Convert the parameters object to a dictionary.
Attributes
anneal_lr
body_lr_factor
clip_value
frames_per_batch
gamma
lmbda
loss_critic_type
lr
max_grad_norm
minibatch_size
normalize_observations
num_epochs
num_iterations
num_normalization_steps
num_test_iterations
rollouts_per_iteration
steps_per_env_per_iteration
use_shared_body
Methods
- __eq__(other)#
Return self==value.
- __init__(frames_per_batch: int | None = 1000, rollouts_per_iteration: int | None = None, steps_per_env_per_iteration: int | None = None, num_iterations: int = 1000, num_epochs: int = 4, minibatch_size: int = 64, lr: float = 0.003, anneal_lr: bool = False, max_grad_norm: float = 1.0, loss_critic_type: str = 'smooth_l1', clip_value: float | bool | None = False, normalize_observations: bool = True, num_normalization_steps: int = 1000, gamma: float = 0.9, lmbda: float = 0.95, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, use_shared_body: bool = True, num_test_iterations: ~typing.Annotated[int, <nip.parameters.base_run.BaseRunPreserve object at 0x7f21e3cf1350>] = 10) None #
- __repr__()#
Return repr(self).
- classmethod _get_param_class_from_dict(param_dict: dict) type[ParameterValue] | None [source]#
Try to get the parameter class from a dictionary of serialised parameters.
- Parameters:
param_dict (dict) – A dictionary of parameters, which may have come from a
to_dict
method. This dictionary may contain a_type
key, which is used to determine the class of the parameter.- Returns:
param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.
- Raises:
ValueError – If the class specified in the dictionary is not a valid parameter class.
- classmethod construct_test_params() BaseHyperParameters [source]#
Construct a set of basic parameters for testing.
- classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) BaseHyperParameters [source]#
Create a parameters object from a dictionary.