nip.parameters.trainers.CommonPpoParameters#

class nip.parameters.trainers.CommonPpoParameters(loss_type: Literal['clip', 'kl_penalty'] = 'clip', clip_epsilon: float = 0.2, kl_target: float = 0.01, kl_beta: float = 1.0, kl_decrement: float = 0.5, kl_increment: float = 2.0, critic_coef: float = 1.0, entropy_eps: float = 0.001, normalize_advantage: bool = True)[source]#

Common parameters for PPO trainers.

Parameters:

loss_type (PpoLossType) – The type of PPO loss function to use. See PpoLossType for options.
clip_epsilon (float) – The PPO clip range when using the clipped PPO loss.
kl_target (float) – The target KL divergence when using the KL penalty PPO loss.
kl_beta (float) – The coefficient of the KL penalty term in the PPO loss.
kl_decrement (float) – The decrement factor for the KL penalty term in the PPO loss.
kl_increment (float) – The increment factor for the KL penalty term in the PPO loss.
critic_coef (float) – The coefficient of the critic term in the PPO loss.
entropy_eps (float) – The coefficient of the entropy term in the PPO loss.
normalize_advantage (bool) – Whether to normalise the advantages in the PPO loss.

Methods Summary

`__eq__`(other)	Return self==value.
`__init__`([loss_type, clip_epsilon, ...])
`__post_init__`()
`__repr__`()	Return repr(self).
`_get_param_class_from_dict`(param_dict)	Try to get the parameter class from a dictionary of serialised parameters.
`construct_test_params`()	Construct a set of basic parameters for testing.
`from_dict`(params_dict[, ignore_extra_keys])	Create a parameters object from a dictionary.
`get`(address)	Get a value from the parameters object using a dot-separated address.
`to_dict`()	Convert the parameters object to a dictionary.

Attributes

`clip_epsilon`
`critic_coef`
`entropy_eps`
`kl_beta`
`kl_decrement`
`kl_increment`
`kl_target`
`loss_type`
`normalize_advantage`

Methods

__eq__(other)#: Return self==value.

__init__(loss_type: Literal['clip', 'kl_penalty'] = 'clip', clip_epsilon: float = 0.2, kl_target: float = 0.01, kl_beta: float = 1.0, kl_decrement: float = 0.5, kl_increment: float = 2.0, critic_coef: float = 1.0, entropy_eps: float = 0.001, normalize_advantage: bool = True) → None#

__post_init__()[source]#

__repr__()#: Return repr(self).

classmethod _get_param_class_from_dict(param_dict: dict) → type[ParameterValue] | None[source]#

Try to get the parameter class from a dictionary of serialised parameters.

Parameters:: param_dict (dict) – A dictionary of parameters, which may have come from a to_dict method. This dictionary may contain a _type key, which is used to determine the class of the parameter.
Returns:: param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.
Raises:: ValueError – If the class specified in the dictionary is not a valid parameter class.

classmethod construct_test_params() → BaseHyperParameters[source]#: Construct a set of basic parameters for testing.

classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) → BaseHyperParameters[source]#

Create a parameters object from a dictionary.

Parameters:

params_dict (dict) – A dictionary of the parameters.
ignore_extra_keys (bool, default=False) – If True, ignore keys in the dictionary that do not correspond to fields in the parameters object.

Returns:

hyper_params (BaseParameters) – The parameters object.

get(address: str) → Any[source]#

Get a value from the parameters object using a dot-separated address.

Parameters:: address (str) – The path to the value in the parameters object, separated by dots.
Returns:: value (Any) – The value at the address.
Raises:: KeyError – If the address does not exist.

to_dict() → dict[source]#

Convert the parameters object to a dictionary.

Turns enums into strings, and sub-parameters into dictionaries. Includes the is_random parameter if it exists.

Returns:: params_dict (dict) – A dictionary of the parameters.

nip.parameters.trainers.CommonPpoParameters

Contents

nip.parameters.trainers.CommonPpoParameters#