nip.rl_objectives#
Implementations of RL objectives, extending those of TorchRL.
Classes
|
Clipped PPO loss which allows multiple actions keys and normalises advantages. |
|
KL penalty PPO loss which allows multiple actions keys and normalises advantages. |
|
Base class for all RL objectives. |
|
Base PPO loss class which allows multiple actions keys and normalises advantages. |
|
Reinforce loss which allows multiple actions keys and normalises advantages. |
|
Loss for Stackelberg Policy Gradient and several variants. |