nip.parameters.types.TrainerType#

nip.parameters.types.TrainerType#

Type for the RL trainer to use.

vanilla_ppo: The Proximal Policy Optimization trainer, with each agent training independently.
solo_agent: A trainer that trains a single agent to solve the task using supervised learning.
spg: Stackelberg Policy Gradient [FCR20] and its variants.
reinforce: The REINFORCE algorithm.
pure_text_ei: Expert Iteration [ATB17] for text-based tasks, where agents are run through text-based APIs (i.e. we don’t run them locally, so everything can be represented as text).
pure_text_malt: Multi-Agent LLM Training (MALT) [MSD+24] for text-based tasks, where agents are run through text-based APIs (i.e. we don’t run them locally, so everything can be represented as text).

alias of Literal[‘vanilla_ppo’, ‘solo_agent’, ‘spg’, ‘reinforce’, ‘pure_text_ei’, ‘pure_text_malt’]

nip.parameters.types.TrainerType