nip.parameters.types.TrainerType#
- nip.parameters.types.TrainerType#
Type for the RL trainer to use.
- vanilla_ppo
The Proximal Policy Optimization trainer, with each agent training independently.
- solo_agent
A trainer that trains a single agent to solve the task using supervised learning.
- spg
Stackelberg Policy Gradient [FCR20] and its variants.
- reinforce
The REINFORCE algorithm.
- pure_text_ei
Expert Iteration [ATB17] for text-based tasks, where agents are run through text-based APIs (i.e. we don’t run them locally, so everything can be represented as text).
- pure_text_malt
Multi-Agent LLM Training (MALT) [MSD+24] for text-based tasks, where agents are run through text-based APIs (i.e. we don’t run them locally, so everything can be represented as text).
alias of
Literal
[‘vanilla_ppo’, ‘solo_agent’, ‘spg’, ‘reinforce’, ‘pure_text_ei’, ‘pure_text_malt’]