Running Experiments#

Overview#

Running an experiment involves the following two steps:

  1. Create a HyperParameters object to specify all the parameters for the experiment.

  2. Call the run_experiment function with the HyperParameters object.

For example, here’s how to run a basic code validation experiment with the NIP protocol, expert iteration (EI) trainer, and default hyper-parameters:

from nip import HyperParameters, run_experiment

hyper_params = HyperParameters(
    scenario="code_validation",
    trainer="pure_text_ei",
    dataset="lrhammond/buggy-apps",
    interaction_protocol="nip",
)

run_experiment(hyper_params)

See Running Experiments (nip.run) for the API reference.

Specifying Hyper-Parameters#

Available hyper-parameters and possible values are listed in the HyperParameters class. Some hyper-parameters are nested. For example, the nip_protocol hyper-parameter holds a NipProtocolParameters object, which in turn holds the hyper-parameters which are specific to the NIP protocol.

To specify the NIP parameters, you can either pass a NipProtocolParameters object to the nip_protocol parameter, or a dictionary, which will be converted to a NipProtocolParameters object.

hyper_params = HyperParameters(
    scenario="code_validation",
    trainer="pure_text_ei",
    dataset="lrhammond/buggy-apps",
    interaction_protocol="nip",
    nip_protocol=NipProtocolParameters(
        max_message_rounds=11,
        min_message_rounds=5,
    ),
)

See Experiment Hyper-Parameters (nip.parameters) for more information about hyper-parameters.

Additional Experiment Settings#

The run_experiment function has several optional arguments that allow you to customize the experiment. These are settings that should not (in theory) affect the results of the experiment. The most important ones are:

Argument

Description

device

The device to run the experiment on, if run locally.

use_wandb

Whether to log the experiment to Weights & Biases.

wandb_project

The Weights & Biases project to log the experiment to, if different from the default.

run_id

The ID of the run to log the experiment to.

num_rollout_workers

The number of workers to use for collecting rollout samples in text-based tasks.

See the documentation for run_experiment for the full list of arguments.

Experiment Scripts#

The library comes with a suite of scripts to facilitate running experiments. In these scripts, the hyper-parameters are specified in a grid, which allows you to run multiple experiments with different hyper-parameters either in parallel or sequentially. The scripts also allow configuring logging to Weights & Biases.

The following are the available scripts for running hyper-parameter sweeps. See also the API reference Experiment Scripts for a complete list of scripts.

Script

Description

ppo_gi.py

Run a PPO experiment with graph isomorphism task.

solo_agents_gi.py

Do supervised training of a single agent on the graph isomorphism task.

ppo_ic.py

Run a PPO experiment with image classification task.

solo_agents_ic.py

Do supervised training of a single agent on the image classification task.

ei_cv.py

Run an expert iteration (EI) experiment with the code validation task.

Let’s consider the ei_cv.py script. This script contains the variable param_grid, which is a dictionary, where the keys are hyper-parameters and the values are lists of values for those hyper-parameters. The script will run an experiment for each combination of hyper-parameters in the grid.

For example, the following grid will run 4 experiments, running the NIP and Debate protocols with the “introductory” and “interview” level code validation datasets:

param_grid = dict(
  interaction_protocol=["nip", "debate"],
  dataset_name=["lrhammond/buggy-apps"],
  apps_difficulty=["introductory", "interview"],
  num_iterations=[8],
  rollouts_per_iteration=[200],
  ...
)

The experiment (which we’ll call test_difficulty_levels) can now be run by calling the script with the following command:

python scripts/ei_cv.py --use_wandb test_difficulty_levels

This will run the experiments sequentially, logging data to Weights & Biases with run IDs test_difficulty_levels_0, test_difficulty_levels_1, etc.

See the documentation for the script for more information on how to run it, or run:

python scripts/ei_cv.py --help