Running Experiments#
Overview#
Running an experiment involves the following two steps:
Create a
HyperParameters
object to specify all the parameters for the experiment.Call the
run_experiment
function with theHyperParameters
object.
For example, here’s how to run a basic code validation experiment with the NIP protocol, expert iteration (EI) trainer, and default hyper-parameters:
from nip import HyperParameters, run_experiment
hyper_params = HyperParameters(
scenario="code_validation",
trainer="pure_text_ei",
dataset="lrhammond/buggy-apps",
interaction_protocol="nip",
)
run_experiment(hyper_params)
See Running Experiments (nip.run) for the API reference.
Specifying Hyper-Parameters#
Available hyper-parameters and possible values are listed in the HyperParameters
class. Some hyper-parameters are nested. For example,
the nip_protocol
hyper-parameter holds a NipProtocolParameters
object, which in turn holds the
hyper-parameters which are specific to the NIP protocol.
To specify the NIP parameters, you can either pass a NipProtocolParameters
object to the nip_protocol
parameter, or a dictionary, which will be converted to a NipProtocolParameters
object.
hyper_params = HyperParameters(
scenario="code_validation",
trainer="pure_text_ei",
dataset="lrhammond/buggy-apps",
interaction_protocol="nip",
nip_protocol=NipProtocolParameters(
max_message_rounds=11,
min_message_rounds=5,
),
)
hyper_params = HyperParameters(
scenario="code_validation",
trainer="pure_text_ei",
dataset="lrhammond/buggy-apps",
interaction_protocol="nip",
nip_protocol={
"max_message_rounds": 11,
"min_message_rounds": 5,
},
)
See Experiment Hyper-Parameters (nip.parameters) for more information about hyper-parameters.
Additional Experiment Settings#
The run_experiment
function has several optional
arguments that allow you to customize the experiment. These are settings that should
not (in theory) affect the results of the experiment. The most important ones are:
Argument |
Description |
---|---|
|
The device to run the experiment on, if run locally. |
|
Whether to log the experiment to Weights & Biases. |
|
The Weights & Biases project to log the experiment to, if different from the default. |
|
The ID of the run to log the experiment to. |
|
The number of workers to use for collecting rollout samples in text-based tasks. |
See the documentation for run_experiment
for the full
list of arguments.
Experiment Scripts#
The library comes with a suite of scripts to facilitate running experiments. In these scripts, the hyper-parameters are specified in a grid, which allows you to run multiple experiments with different hyper-parameters either in parallel or sequentially. The scripts also allow configuring logging to Weights & Biases.
The following are the available scripts for running hyper-parameter sweeps. See also the API reference Experiment Scripts for a complete list of scripts.
Script |
Description |
---|---|
Run a PPO experiment with graph isomorphism task. |
|
Do supervised training of a single agent on the graph isomorphism task. |
|
Run a PPO experiment with image classification task. |
|
Do supervised training of a single agent on the image classification task. |
|
Run an expert iteration (EI) experiment with the code validation task. |
Let’s consider the ei_cv.py script. This
script contains the variable param_grid
, which is a dictionary, where the keys are
hyper-parameters and the values are lists of values for those hyper-parameters. The
script will run an experiment for each combination of hyper-parameters in the grid.
For example, the following grid will run 4 experiments, running the NIP and Debate protocols with the “introductory” and “interview” level code validation datasets:
param_grid = dict(
interaction_protocol=["nip", "debate"],
dataset_name=["lrhammond/buggy-apps"],
apps_difficulty=["introductory", "interview"],
num_iterations=[8],
rollouts_per_iteration=[200],
...
)
The experiment (which we’ll call test_difficulty_levels
) can now be run by calling the script with the following command:
python scripts/ei_cv.py --use_wandb test_difficulty_levels
This will run the experiments sequentially, logging data to Weights & Biases
with run IDs test_difficulty_levels_0
, test_difficulty_levels_1
, etc.
See the documentation for the script for more information on how to run it, or run:
python scripts/ei_cv.py --help