Trainers (nip.trainers
)#
Overview#
Trainers are responsible for optimising the agents in an experiment. A trainer
(Trainer
) takes as input the following:
The hyper-parameter of the experiment (a
HyperParameters
object).A
ScenarioInstance
object, which contains all the components of the experiment. The most important components are:The datasets.
The interaction protocol handler (
ProtocolHandler
).The agents.
The environment.
An
ExperimentSettings
object, which contains various settings for the experiment not relevant to reproducibility (e.g. the GPU device number and whether to use Weights & Biases).
When called, the trainer performs some number of optimisation steps on the agents, using the environment to generate the training data.
TensorDict or Pure Text Trainer?#
There are two types of trainers: those that deal directly with neural networks and those
that interact with text-based models through an API. The former kind use data structures
based on PyTorch’s TensorDict
objects, while the latter use a similar, custom data structure containing nested
dictionaries of Numpy string arrays
(NestedArrayDict
).
TensorDict-based trainers use the TorchRL Library.
Which of these two types is appropriate depends on the type of agents in the experiment. The environment, datasets and agents must be the appropriate type for the trainer. See TensorDict or Pure Text Scenario? for more information.
Base classes#
Base classes for trainers are found in the following modules:
Base classes for all trainers. |
|
A generic reinforcement learning trainer using tensordicts. |
|
Base classes for RL trainers for text-based environments that only use APIs. |
Available Trainers#
Train agents in isolation, without any interaction with other agents. |
|
Vanilla PPO RL trainer. |
|
Stackelberg Policy Gradient [FCR20] RL trainer. |
|
REINFORCE policy gradient RL trainer. |
|
Expert Iteration (EI) trainer for text-based environments that only use APIs. |
|
Multi-Agent LLM Training (MALT) for text-based environments that only use APIs. |
Trainer Registry#
Trainers are registered by using the following function as a decorator: