Trainers (nip.trainers)#

Overview#

Trainers are responsible for optimising the agents in an experiment. A trainer (Trainer) takes as input the following:

  1. The hyper-parameter of the experiment (a HyperParameters object).

  2. A ScenarioInstance object, which contains all the components of the experiment. The most important components are:

    • The datasets.

    • The interaction protocol handler (ProtocolHandler).

    • The agents.

    • The environment.

  3. An ExperimentSettings object, which contains various settings for the experiment not relevant to reproducibility (e.g. the GPU device number and whether to use Weights & Biases).

When called, the trainer performs some number of optimisation steps on the agents, using the environment to generate the training data.

TensorDict or Pure Text Trainer?#

There are two types of trainers: those that deal directly with neural networks and those that interact with text-based models through an API. The former kind use data structures based on PyTorch’s TensorDict objects, while the latter use a similar, custom data structure containing nested dictionaries of Numpy string arrays (NestedArrayDict). TensorDict-based trainers use the TorchRL Library.

Which of these two types is appropriate depends on the type of agents in the experiment. The environment, datasets and agents must be the appropriate type for the trainer. See TensorDict or Pure Text Scenario? for more information.

Base classes#

Base classes for trainers are found in the following modules:

trainer_base

Base classes for all trainers.

rl_tensordict_base

A generic reinforcement learning trainer using tensordicts.

rl_pure_text_base

Base classes for RL trainers for text-based environments that only use APIs.

Available Trainers#

solo_agent

Train agents in isolation, without any interaction with other agents.

vanilla_ppo

Vanilla PPO RL trainer.

spg

Stackelberg Policy Gradient [FCR20] RL trainer.

reinforce

REINFORCE policy gradient RL trainer.

ei_pure_text

Expert Iteration (EI) trainer for text-based environments that only use APIs.

malt_pure_text

Multi-Agent LLM Training (MALT) for text-based environments that only use APIs.

Trainer Registry#

Trainers are registered by using the following function as a decorator:

nip.trainers.registry.register_trainer(trainer: Literal['vanilla_ppo', 'solo_agent', 'spg', 'reinforce', 'pure_text_ei', 'pure_text_malt'])[source]#

Register a trainer class. Used as a decorator.