nip.parameters.trainers.CommonPpoParameters#

class nip.parameters.trainers.CommonPpoParameters(loss_type: Literal['clip', 'kl_penalty'] = 'clip', clip_epsilon: float = 0.2, kl_target: float = 0.01, kl_beta: float = 1.0, kl_decrement: float = 0.5, kl_increment: float = 2.0, critic_coef: float = 1.0, entropy_eps: float = 0.001, normalize_advantage: bool = True)[source]#

Common parameters for PPO trainers.

Parameters:
  • loss_type (PpoLossType) – The type of PPO loss function to use. See PpoLossType for options.

  • clip_epsilon (float) – The PPO clip range when using the clipped PPO loss.

  • kl_target (float) – The target KL divergence when using the KL penalty PPO loss.

  • kl_beta (float) – The coefficient of the KL penalty term in the PPO loss.

  • kl_decrement (float) – The decrement factor for the KL penalty term in the PPO loss.

  • kl_increment (float) – The increment factor for the KL penalty term in the PPO loss.

  • critic_coef (float) – The coefficient of the critic term in the PPO loss.

  • entropy_eps (float) – The coefficient of the entropy term in the PPO loss.

  • normalize_advantage (bool) – Whether to normalise the advantages in the PPO loss.

Methods Summary

__eq__(other)

Return self==value.

__init__([loss_type, clip_epsilon, ...])

__post_init__()

__repr__()

Return repr(self).

_get_param_class_from_dict(param_dict)

Try to get the parameter class from a dictionary of serialised parameters.

construct_test_params()

Construct a set of basic parameters for testing.

from_dict(params_dict[, ignore_extra_keys])

Create a parameters object from a dictionary.

get(address)

Get a value from the parameters object using a dot-separated address.

to_dict()

Convert the parameters object to a dictionary.

Attributes

clip_epsilon

critic_coef

entropy_eps

kl_beta

kl_decrement

kl_increment

kl_target

loss_type

normalize_advantage

Methods

__eq__(other)#

Return self==value.

__init__(loss_type: Literal['clip', 'kl_penalty'] = 'clip', clip_epsilon: float = 0.2, kl_target: float = 0.01, kl_beta: float = 1.0, kl_decrement: float = 0.5, kl_increment: float = 2.0, critic_coef: float = 1.0, entropy_eps: float = 0.001, normalize_advantage: bool = True) None#
__post_init__()[source]#
__repr__()#

Return repr(self).

classmethod _get_param_class_from_dict(param_dict: dict) type[ParameterValue] | None[source]#

Try to get the parameter class from a dictionary of serialised parameters.

Parameters:

param_dict (dict) – A dictionary of parameters, which may have come from a to_dict method. This dictionary may contain a _type key, which is used to determine the class of the parameter.

Returns:

param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.

Raises:

ValueError – If the class specified in the dictionary is not a valid parameter class.

classmethod construct_test_params() BaseHyperParameters[source]#

Construct a set of basic parameters for testing.

classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) BaseHyperParameters[source]#

Create a parameters object from a dictionary.

Parameters:
  • params_dict (dict) – A dictionary of the parameters.

  • ignore_extra_keys (bool, default=False) – If True, ignore keys in the dictionary that do not correspond to fields in the parameters object.

Returns:

hyper_params (BaseParameters) – The parameters object.

get(address: str) Any[source]#

Get a value from the parameters object using a dot-separated address.

Parameters:

address (str) – The path to the value in the parameters object, separated by dots.

Returns:

value (Any) – The value at the address.

Raises:

KeyError – If the address does not exist.

to_dict() dict[source]#

Convert the parameters object to a dictionary.

Turns enums into strings, and sub-parameters into dictionaries. Includes the is_random parameter if it exists.

Returns:

params_dict (dict) – A dictionary of the parameters.