nip.parameters.trainers.RlTrainerParameters#

class nip.parameters.trainers.RlTrainerParameters(frames_per_batch: int | None = 1000, rollouts_per_iteration: int | None = None, steps_per_env_per_iteration: int | None = None, num_iterations: int = 1000, num_epochs: int = 4, minibatch_size: int = 64, lr: float = 0.003, anneal_lr: bool = False, max_grad_norm: float = 1.0, loss_critic_type: str = 'smooth_l1', clip_value: float | bool | None = False, normalize_observations: bool = True, num_normalization_steps: int = 1000, gamma: float = 0.9, lmbda: float = 0.95, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, use_shared_body: bool = True, num_test_iterations: ~typing.Annotated[int, <nip.parameters.base_run.BaseRunPreserve object at 0x7fbe9187d610>] = 10)[source]#

Additional parameters common to all RL trainers.

Parameters:

frames_per_batch (int | None) – The number of frames to sample per training iteration. If None we set the number of frames so that rollouts_per_iteration rollouts are sampled per iteration.
rollouts_per_iteration (int | None) – If frames_per_batch is None, we use this parameter to determine the number of rollouts to sample per iteration. frames_per_batch is then set to rollouts_per_iteration * steps_per_env_per_iteration. If None, this defaults to the dataset size, so that every training datapoint appears exactly once in each iteration.
steps_per_env_per_iteration (int | None) – Each batch is divided into a number of environments which run trajectories for this many steps. Note that when a trajectory ends, a new one is started immediately. This must be a factor of frames_per_batch, since the number of environments is frames_per_batch / steps_per_env_per_iteration. If None, this defaults to max_message_rounds.
num_iterations (int) – The number of sampling and training iterations. num_iterations * frames_per_batch is the total number of frames sampled during training.
num_epochs (int) – The number of epochs per training iteration.
minibatch_size (int) – The size of the minibatches in each optimization step.
lr (float) – The learning rate.
anneal_lr (bool) – Whether to (linearly) anneal the learning rate over time. Defaults to False.
max_grad_norm (float) – The maximum norm of the gradients during optimization.
loss_critic_type (str) – Can be one of “l1”, “l2” or “smooth_l1”. Defaults to "smooth_l1".
clip_value (float or bool, optional) – If a float is provided, it will be used to compute a clipped version of the value prediction with respect to the input tensordict value estimate and use it to calculate the value loss. The purpose of clipping is to limit the impact of extreme value predictions, helping stabilize training and preventing large updates. However, it will have no impact if the value estimate was done by the current version of the value estimator. If instead True is provided, the clip_epsilon parameter will be used as the clipping threshold. If not provided or False, no clipping will be performed. Defaults to False.
normalize_observations (bool) – Whether to normalise the observations in the environment.
num_normalization_steps (int) – The number of steps to use to calculate the mean and standard deviation of the observations for normalisation. The environment is run for this many steps in total with random actions.
gamma (float) – The discount factor.
lmbda (float) – The GAE lambda parameter.
use_shared_body (bool) – Whether the actor and critic share the same body, when using a critic.
num_test_iterations (int) – The number of iterations to run the test for. In each iteration we sample frames_per_batch frames, as in training.

Methods Summary

`__eq__`(other)	Return self==value.
`__init__`([frames_per_batch, ...])
`__post_init__`()
`__repr__`()	Return repr(self).
`_get_param_class_from_dict`(param_dict)	Try to get the parameter class from a dictionary of serialised parameters.
`construct_test_params`()	Construct a set of basic parameters for testing.
`from_dict`(params_dict[, ignore_extra_keys])	Create a parameters object from a dictionary.
`get`(address)	Get a value from the parameters object using a dot-separated address.
`to_dict`()	Convert the parameters object to a dictionary.

Attributes

`anneal_lr`
`body_lr_factor`
`clip_value`
`frames_per_batch`
`gamma`
`lmbda`
`loss_critic_type`
`lr`
`max_grad_norm`
`minibatch_size`
`normalize_observations`
`num_epochs`
`num_iterations`
`num_normalization_steps`
`num_test_iterations`
`rollouts_per_iteration`
`steps_per_env_per_iteration`
`use_shared_body`

Methods

__eq__(other)#: Return self==value.

__init__(frames_per_batch: int | None = 1000, rollouts_per_iteration: int | None = None, steps_per_env_per_iteration: int | None = None, num_iterations: int = 1000, num_epochs: int = 4, minibatch_size: int = 64, lr: float = 0.003, anneal_lr: bool = False, max_grad_norm: float = 1.0, loss_critic_type: str = 'smooth_l1', clip_value: float | bool | None = False, normalize_observations: bool = True, num_normalization_steps: int = 1000, gamma: float = 0.9, lmbda: float = 0.95, body_lr_factor: ~nip.parameters.agents.LrFactors | dict | None = None, use_shared_body: bool = True, num_test_iterations: ~typing.Annotated[int, <nip.parameters.base_run.BaseRunPreserve object at 0x7fbe9187d610>] = 10) → None#

__post_init__()[source]#

__repr__()#: Return repr(self).

classmethod _get_param_class_from_dict(param_dict: dict) → type[ParameterValue] | None[source]#

Try to get the parameter class from a dictionary of serialised parameters.

Parameters:: param_dict (dict) – A dictionary of parameters, which may have come from a to_dict method. This dictionary may contain a _type key, which is used to determine the class of the parameter.
Returns:: param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.
Raises:: ValueError – If the class specified in the dictionary is not a valid parameter class.

classmethod construct_test_params() → BaseHyperParameters[source]#: Construct a set of basic parameters for testing.

classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) → BaseHyperParameters[source]#

Create a parameters object from a dictionary.

Parameters:

params_dict (dict) – A dictionary of the parameters.
ignore_extra_keys (bool, default=False) – If True, ignore keys in the dictionary that do not correspond to fields in the parameters object.

Returns:

hyper_params (BaseParameters) – The parameters object.

get(address: str) → Any[source]#

Get a value from the parameters object using a dot-separated address.

Parameters:: address (str) – The path to the value in the parameters object, separated by dots.
Returns:: value (Any) – The value at the address.
Raises:: KeyError – If the address does not exist.

to_dict() → dict[source]#

Convert the parameters object to a dictionary.

Turns enums into strings, and sub-parameters into dictionaries. Includes the is_random parameter if it exists.

Returns:: params_dict (dict) – A dictionary of the parameters.

nip.parameters.trainers.RlTrainerParameters

Contents

nip.parameters.trainers.RlTrainerParameters#