Running Experiments#

Overview#

Running an experiment involves the following two steps:

  1. Create a HyperParameters object to specify all the parameters for the experiment.

  2. Call the run_experiment function with the HyperParameters object.

For example, here’s how to run a basic code validation experiment with the NIP protocol, expert iteration (EI) trainer, and default hyper-parameters:

from nip import HyperParameters, run_experiment

hyper_params = HyperParameters(
    scenario="code_validation",
    trainer="pure_text_ei",
    dataset="lrhammond/buggy-apps",
    interaction_protocol="nip",
)

run_experiment(hyper_params)

See Running Experiments (nip.run) for the API reference and How an Experiment is Built for an overview of how the experiment components are built.

Specifying Hyper-Parameters#

Available hyper-parameters and possible values are listed in the HyperParameters class. Some hyper-parameters are nested. For example, the nip_protocol hyper-parameter holds a NipProtocolParameters object, which in turn holds the hyper-parameters which are specific to the NIP protocol.

To specify the NIP parameters, you can either pass a NipProtocolParameters object to the nip_protocol parameter, or a dictionary, which will be converted to a NipProtocolParameters object.

hyper_params = HyperParameters(
    scenario="code_validation",
    trainer="pure_text_ei",
    dataset="lrhammond/buggy-apps",
    interaction_protocol="nip",
    nip_protocol=NipProtocolParameters(
        max_message_rounds=11,
        min_message_rounds=5,
    ),
)

See Experiment Hyper-Parameters (nip.parameters) for more information about hyper-parameters.

Additional Experiment Settings#

The run_experiment function has several optional arguments that allow you to customize the experiment. These are settings that should not (in theory) affect the results of the experiment. The most important ones are:

Argument

Description

device

The device to run the experiment on, if run locally.

use_wandb

Whether to log the experiment to Weights & Biases.

wandb_project

The Weights & Biases project to log the experiment to, if different from the default.

run_id

The ID of the run to log the experiment to.

See the documentation for run_experiment for the full list of arguments.

Experiment Scripts#

The library comes with a suite of scripts to facilitate running experiments. In these scripts, the hyper-parameters are specified in a grid, which allows you to run multiple experiments with different hyper-parameters either in parallel or sequentially. The scripts also allow configuring logging to Weights & Biases.

The following are the available scripts for running hyper-parameter sweeps. See also the API reference Experiment Scripts for a complete list of scripts.

Script

Description

ppo_gi.py

Run a PPO experiment with graph isomorphism task.

solo_agents_gi.py

Do supervised training of a single agent on the graph isomorphism task.

ppo_ic.py

Run a PPO experiment with image classification task.

solo_agents_ic.py

Do supervised training of a single agent on the image classification task.

cv_experiment.py

Run an experiment with the code validation task using a configuration file.

Let’s consider the cv_experiment.py script. This script takes the --config-file argument, which is a path to a JSON, JSON5, or YAML file. This file should contain a dictionary with keys “kind” and “parameters”. If “kind” is “single_experiment”, then “parameters” should be a dictionary with the config values to use. If “kind” is “grid”, then “parameters” should be a dictionary with keys as hyperparameter names and values as lists of values to try. The script will run an experiment for each combination of hyper-parameters in the grid.

The possible config values and their default options are listed in the CodeValidationExperimentConfig class, in the cv_experiment.py script. Any config value that is not specified in the config file will use the default value.

For example, the following JSON file defines a grid will run 4 expert iteration (EI) experiments, running the NIP and Debate protocols with the “introductory” and “interview” level code validation datasets:

scripts/config/cv_experiment/test_difficulty_levels.json#
{
  "kind": "grid",
  "parameters": {
    "trainer": ["pure_text_ei"],
    "interaction_protocol": ["nip", "debate"],
    "dataset_name": ["lrhammond/buggy-apps"],
    "apps_difficulty": ["introductory", "interview"],
    "num_iterations": [8],
    "rollouts_per_iteration": [200]
  }
}

The experiment can now be run by calling the script with the following command:

python scripts/cv_experiment.py --use-wandb --config-file test_difficulty_levels.json trial_1

This will run the experiments sequentially, logging data to Weights & Biases with run IDs cv_test_difficulty_levels_trial_1_0, cv_test_difficulty_levels_trial_1_1, etc.

See the documentation for the script for more information on how to run it, or run:

python scripts/cv_experiment.py --help