nip.parameters.trainers.CommonPpoParameters#
- class nip.parameters.trainers.CommonPpoParameters(loss_type: Literal['clip', 'kl_penalty'] = 'clip', clip_epsilon: float = 0.2, kl_target: float = 0.01, kl_beta: float = 1.0, kl_decrement: float = 0.5, kl_increment: float = 2.0, critic_coef: float = 1.0, entropy_eps: float = 0.001, normalize_advantage: bool = True)[source]#
Common parameters for PPO trainers.
- Parameters:
loss_type (PpoLossType) – The type of PPO loss function to use. See
PpoLossTypefor options.clip_epsilon (float) – The PPO clip range when using the clipped PPO loss.
kl_target (float) – The target KL divergence when using the KL penalty PPO loss.
kl_beta (float) – The coefficient of the KL penalty term in the PPO loss.
kl_decrement (float) – The decrement factor for the KL penalty term in the PPO loss.
kl_increment (float) – The increment factor for the KL penalty term in the PPO loss.
critic_coef (float) – The coefficient of the critic term in the PPO loss.
entropy_eps (float) – The coefficient of the entropy term in the PPO loss.
normalize_advantage (bool) – Whether to normalise the advantages in the PPO loss.
Methods Summary
__eq__(other)Return self==value.
__init__([loss_type, clip_epsilon, ...])__repr__()Return repr(self).
_get_param_class_from_dict(param_dict)Try to get the parameter class from a dictionary of serialised parameters.
Construct a set of basic parameters for testing.
from_dict(params_dict[, ignore_extra_keys])Create a parameters object from a dictionary.
get(address)Get a value from the parameters object using a dot-separated address.
to_dict()Convert the parameters object to a dictionary.
Attributes
clip_epsiloncritic_coefentropy_epskl_betakl_decrementkl_incrementkl_targetloss_typenormalize_advantageMethods
- __eq__(other)#
Return self==value.
- __init__(loss_type: Literal['clip', 'kl_penalty'] = 'clip', clip_epsilon: float = 0.2, kl_target: float = 0.01, kl_beta: float = 1.0, kl_decrement: float = 0.5, kl_increment: float = 2.0, critic_coef: float = 1.0, entropy_eps: float = 0.001, normalize_advantage: bool = True) None#
- __repr__()#
Return repr(self).
- classmethod _get_param_class_from_dict(param_dict: dict) type[ParameterValue] | None[source]#
Try to get the parameter class from a dictionary of serialised parameters.
- Parameters:
param_dict (dict) – A dictionary of parameters, which may have come from a
to_dictmethod. This dictionary may contain a_typekey, which is used to determine the class of the parameter.- Returns:
param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.
- Raises:
ValueError – If the class specified in the dictionary is not a valid parameter class.
- classmethod construct_test_params() BaseHyperParameters[source]#
Construct a set of basic parameters for testing.
- classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) BaseHyperParameters[source]#
Create a parameters object from a dictionary.