nip.parameters.types.TrainerType

Contents

nip.parameters.types.TrainerType#

nip.parameters.types.TrainerType#

Type for the RL trainer to use.

vanilla_ppo

The Proximal Policy Optimization trainer, with each agent training independently.

solo_agent

A trainer that trains a single agent to solve the task using supervised learning.

spg

Stackelberg Policy Gradient [FCR20] and its variants.

reinforce

The REINFORCE algorithm.

pure_text_ei

Expert Iteration [ATB17] for text-based tasks, where agents are run through text-based APIs (i.e. we don’t run them locally, so everything can be represented as text).

pure_text_malt

Multi-Agent LLM Training (MALT) [MSD+24] for text-based tasks, where agents are run through text-based APIs (i.e. we don’t run them locally, so everything can be represented as text).

alias of Literal[‘vanilla_ppo’, ‘solo_agent’, ‘spg’, ‘reinforce’, ‘pure_text_ei’, ‘pure_text_malt’]