nip.parameters.trainers.CommonPpoParameters#
- class nip.parameters.trainers.CommonPpoParameters(loss_type: Literal['clip', 'kl_penalty'] = 'clip', clip_epsilon: float = 0.2, kl_target: float = 0.01, kl_beta: float = 1.0, kl_decrement: float = 0.5, kl_increment: float = 2.0, critic_coef: float = 1.0, entropy_eps: float = 0.001, normalize_advantage: bool = True)[source]#
Common parameters for PPO trainers.
- Parameters:
loss_type (PpoLossType) – The type of PPO loss function to use. See
PpoLossType
for options.clip_epsilon (float) – The PPO clip range when using the clipped PPO loss.
kl_target (float) – The target KL divergence when using the KL penalty PPO loss.
kl_beta (float) – The coefficient of the KL penalty term in the PPO loss.
kl_decrement (float) – The decrement factor for the KL penalty term in the PPO loss.
kl_increment (float) – The increment factor for the KL penalty term in the PPO loss.
critic_coef (float) – The coefficient of the critic term in the PPO loss.
entropy_eps (float) – The coefficient of the entropy term in the PPO loss.
normalize_advantage (bool) – Whether to normalise the advantages in the PPO loss.
Methods Summary
__eq__
(other)Return self==value.
__init__
([loss_type, clip_epsilon, ...])__repr__
()Return repr(self).
_get_param_class_from_dict
(param_dict)Try to get the parameter class from a dictionary of serialised parameters.
Construct a set of basic parameters for testing.
from_dict
(params_dict[, ignore_extra_keys])Create a parameters object from a dictionary.
get
(address)Get a value from the parameters object using a dot-separated address.
to_dict
()Convert the parameters object to a dictionary.
Attributes
clip_epsilon
critic_coef
entropy_eps
kl_beta
kl_decrement
kl_increment
kl_target
loss_type
normalize_advantage
Methods
- __eq__(other)#
Return self==value.
- __init__(loss_type: Literal['clip', 'kl_penalty'] = 'clip', clip_epsilon: float = 0.2, kl_target: float = 0.01, kl_beta: float = 1.0, kl_decrement: float = 0.5, kl_increment: float = 2.0, critic_coef: float = 1.0, entropy_eps: float = 0.001, normalize_advantage: bool = True) None #
- __repr__()#
Return repr(self).
- classmethod _get_param_class_from_dict(param_dict: dict) type[ParameterValue] | None [source]#
Try to get the parameter class from a dictionary of serialised parameters.
- Parameters:
param_dict (dict) – A dictionary of parameters, which may have come from a
to_dict
method. This dictionary may contain a_type
key, which is used to determine the class of the parameter.- Returns:
param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.
- Raises:
ValueError – If the class specified in the dictionary is not a valid parameter class.
- classmethod construct_test_params() BaseHyperParameters [source]#
Construct a set of basic parameters for testing.
- classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) BaseHyperParameters [source]#
Create a parameters object from a dictionary.