How an Experiment is Built#
The is an overview of the main steps taken when the run_experiment
function is called, which is the main entry point for running
an experiment.
We assume that code of the following form has been executed:
from nip import HyperParameters, run_experiment
hyper_params = HyperParameters(...)
run_experiment(hyper_params)
We see if the hyper-parameters need to be modified. There are two reasons we might want to do this, both controlled by the
BaseRunParameters
sub-parameters, located athyper_params.base_run
.If
hyper_params.base_run
is"parameters"
, we copy the parameters from a previous run stored in Weights & Biases. This is useful if we want resume a run.If
hyper_params.base_run
is"rerun_tests"
, we copy the parameters from a previous run stored in Weights & Biases, except for parameters which control how tests are run. This is useful if we have a previous run without tests, and we want to rerun it, just doing the testing loop.
We set up Weights & Biases, if the
use_wandb
argument ofrun_experiment
is set toTrue
.An
ExperimentSettings
object is created, which contains various settings for the experiment not relevant to reproducibility (e.g. the GPU device number, and the Weights & Biases run).The
ScenarioInstance
object is created, which contains all the components of the experiment. This is done by calling thebuild_scenario_instance
function, which executes the following steps.We build the
ProtocolHandler
, which will handle the interaction protocol. This is done by calling thebuild_protocol_handler
function, which looks for the appropriate protocol handler for the parameters in the registry.The train and test datasets are loaded, by initialising the appropriate
Dataset
class.We build the agents. Agents typically consist of multiple parts, and which parts get built depends on the hyper-parameters. Each agent specified in the
AgentsParameters
object located athyper_params.agents
is built in the following steps. Hereagent_params = hyper_params.agents[agent_name]
is an instance ofAgentParameters
.We set the seed based on
hyper_params.seed
and the agent name.Agents are either composed of parts, like bodies and heads, or are a single entity (a
WholeAgent
). Which of these options pertains, and which parts are built, is determined by the hyper-parameters. For example, TensorDict-based RL trainers require agents consisting of parts, with a policy head and a value head (see Creating a New Trainer for more information). These parts are built by initialising the appropriateAgentPart
classes.An instance of an
Agent
dataclass is created, which holds all the parts of the agent.If we’re loading a checkpoint (i.e.
agent_params.load_checkpoint_and_parameters
isTrue
), we load the checkpoint and parameters from the Weights & Biases run specified byagent_params.checkpoint_run_id
. Otherwise, we let the agent’s weights be initialised randomly.
If set in the hyper-parameters, pretrained embeddings for each agent are loaded into the datasets. This is done by initialising the appropriate
PretrainedModel
class, and generating embeddings.If the trainer and scenario are pure-text based (see TensorDict or Pure Text Trainer? and TensorDict or Pure Text Scenario?), we also build shared model groups (instances of
PureTextSharedModelGroup
). These provide an interface for dealing with agents which share an underlying model, allowing for running fine-tuning jobs on a group level rather than on an agent level.For RL trainers, the following additional components are built.
The train and test environments are built, by initialising the appropriate
Environment
class.The agent parts are combined into combined agent parts (instances of
CombinedAgentPart
). Each combined agent part contains the corresponding parts of all agents, so can be treated as a single actor in reinforcement learning (with observations and actions indexed by a new agent dimension). This allows working easily with the TorchRL library.
The trainer is built, by initialising the appropriate
Trainer
class.Finally, the trainer is run by calling the
train
method of the trainer.