nip.parameters.trainers.RlTrainerParameters#
- class nip.parameters.trainers.RlTrainerParameters(frames_per_batch: int | None = 1000, rollouts_per_iteration: int | None = None, steps_per_env_per_iteration: int | None = None, num_iterations: int = 1000, num_epochs: int = 4, minibatch_size: int = 64, lr: float = 0.003, anneal_lr: bool = False, max_grad_norm: float = 1.0, loss_critic_type: str = 'smooth_l1', clip_value: float | bool | None = False, normalize_observations: bool = True, num_normalization_steps: int = 1000, gamma: float = 0.9, lmbda: float = 0.95, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, use_shared_body: bool = True, num_test_iterations: ~typing.Annotated[int, <nip.parameters.base_run.BaseRunPreserve object at 0x7ffaff919890>] = 10)[source]#
Additional parameters common to all RL trainers.
- Parameters:
frames_per_batch (int | None) – The number of frames to sample per training iteration. If
Nonewe set the number of frames so thatrollouts_per_iterationrollouts are sampled per iteration.rollouts_per_iteration (int | None) – If
frames_per_batchisNone, we use this parameter to determine the number of rollouts to sample per iteration.frames_per_batchis then set torollouts_per_iteration * steps_per_env_per_iteration. IfNone, this defaults to the dataset size, so that every training datapoint appears exactly once in each iteration.steps_per_env_per_iteration (int | None) – Each batch is divided into a number of environments which run trajectories for this many steps. Note that when a trajectory ends, a new one is started immediately. This must be a factor of
frames_per_batch, since the number of environments isframes_per_batch / steps_per_env_per_iteration. IfNone, this defaults tomax_message_rounds.num_iterations (int) – The number of sampling and training iterations. num_iterations * frames_per_batch is the total number of frames sampled during training.
num_epochs (int) – The number of epochs per training iteration.
minibatch_size (int) – The size of the minibatches in each optimization step.
lr (float) – The learning rate.
anneal_lr (bool) – Whether to (linearly) anneal the learning rate over time. Defaults to
False.max_grad_norm (float) – The maximum norm of the gradients during optimization.
loss_critic_type (str) – Can be one of “l1”, “l2” or “smooth_l1”. Defaults to
"smooth_l1".clip_value (float or bool, optional) – If a
floatis provided, it will be used to compute a clipped version of the value prediction with respect to the input tensordict value estimate and use it to calculate the value loss. The purpose of clipping is to limit the impact of extreme value predictions, helping stabilize training and preventing large updates. However, it will have no impact if the value estimate was done by the current version of the value estimator. If insteadTrueis provided, theclip_epsilonparameter will be used as the clipping threshold. If not provided orFalse, no clipping will be performed. Defaults toFalse.normalize_observations (bool) – Whether to normalise the observations in the environment.
num_normalization_steps (int) – The number of steps to use to calculate the mean and standard deviation of the observations for normalisation. The environment is run for this many steps in total with random actions.
gamma (float) – The discount factor.
lmbda (float) – The GAE lambda parameter.
use_shared_body (bool) – Whether the actor and critic share the same body, when using a critic.
num_test_iterations (int) – The number of iterations to run the test for. In each iteration we sample
frames_per_batchframes, as in training.
Methods Summary
__eq__(other)Return self==value.
__init__([frames_per_batch, ...])__repr__()Return repr(self).
_get_param_class_from_dict(param_dict)Try to get the parameter class from a dictionary of serialised parameters.
Construct a set of basic parameters for testing.
from_dict(params_dict[, ignore_extra_keys])Create a parameters object from a dictionary.
get(address)Get a value from the parameters object using a dot-separated address.
to_dict()Convert the parameters object to a dictionary.
Attributes
anneal_lrbody_lr_factorclip_valueframes_per_batchgammalmbdaloss_critic_typelrmax_grad_normminibatch_sizenormalize_observationsnum_epochsnum_iterationsnum_normalization_stepsnum_test_iterationsrollouts_per_iterationsteps_per_env_per_iterationuse_shared_bodyMethods
- __eq__(other)#
Return self==value.
- __init__(frames_per_batch: int | None = 1000, rollouts_per_iteration: int | None = None, steps_per_env_per_iteration: int | None = None, num_iterations: int = 1000, num_epochs: int = 4, minibatch_size: int = 64, lr: float = 0.003, anneal_lr: bool = False, max_grad_norm: float = 1.0, loss_critic_type: str = 'smooth_l1', clip_value: float | bool | None = False, normalize_observations: bool = True, num_normalization_steps: int = 1000, gamma: float = 0.9, lmbda: float = 0.95, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, use_shared_body: bool = True, num_test_iterations: ~typing.Annotated[int, <nip.parameters.base_run.BaseRunPreserve object at 0x7ffaff919890>] = 10) None#
- __repr__()#
Return repr(self).
- classmethod _get_param_class_from_dict(param_dict: dict) type[ParameterValue] | None[source]#
Try to get the parameter class from a dictionary of serialised parameters.
- Parameters:
param_dict (dict) – A dictionary of parameters, which may have come from a
to_dictmethod. This dictionary may contain a_typekey, which is used to determine the class of the parameter.- Returns:
param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.
- Raises:
ValueError – If the class specified in the dictionary is not a valid parameter class.
- classmethod construct_test_params() BaseHyperParameters[source]#
Construct a set of basic parameters for testing.
- classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) BaseHyperParameters[source]#
Create a parameters object from a dictionary.