Running Experiments#
Overview#
Running an experiment involves the following two steps:
Create a
HyperParameters
object to specify all the parameters for the experiment.Call the
run_experiment
function with theHyperParameters
object.
For example, here’s how to run a basic code validation experiment with the NIP protocol, expert iteration (EI) trainer, and default hyper-parameters:
from nip import HyperParameters, run_experiment
hyper_params = HyperParameters(
scenario="code_validation",
trainer="pure_text_ei",
dataset="lrhammond/buggy-apps",
interaction_protocol="nip",
)
run_experiment(hyper_params)
See Running Experiments (nip.run) for the API reference and How an Experiment is Built for an overview of how the experiment components are built.
Specifying Hyper-Parameters#
Available hyper-parameters and possible values are listed in the HyperParameters
class. Some hyper-parameters are nested. For example,
the nip_protocol
hyper-parameter holds a NipProtocolParameters
object, which in turn holds the
hyper-parameters which are specific to the NIP protocol.
To specify the NIP parameters, you can either pass a NipProtocolParameters
object to the nip_protocol
parameter, or a dictionary, which will be converted to a NipProtocolParameters
object.
hyper_params = HyperParameters(
scenario="code_validation",
trainer="pure_text_ei",
dataset="lrhammond/buggy-apps",
interaction_protocol="nip",
nip_protocol=NipProtocolParameters(
max_message_rounds=11,
min_message_rounds=5,
),
)
hyper_params = HyperParameters(
scenario="code_validation",
trainer="pure_text_ei",
dataset="lrhammond/buggy-apps",
interaction_protocol="nip",
nip_protocol={
"max_message_rounds": 11,
"min_message_rounds": 5,
},
)
See Experiment Hyper-Parameters (nip.parameters) for more information about hyper-parameters.
Additional Experiment Settings#
The run_experiment
function has several optional
arguments that allow you to customize the experiment. These are settings that should
not (in theory) affect the results of the experiment. The most important ones are:
Argument |
Description |
---|---|
|
The device to run the experiment on, if run locally. |
|
Whether to log the experiment to Weights & Biases. |
|
The Weights & Biases project to log the experiment to, if different from the default. |
|
The ID of the run to log the experiment to. |
See the documentation for run_experiment
for the full
list of arguments.
Experiment Scripts#
The library comes with a suite of scripts to facilitate running experiments. In these scripts, the hyper-parameters are specified in a grid, which allows you to run multiple experiments with different hyper-parameters either in parallel or sequentially. The scripts also allow configuring logging to Weights & Biases.
The following are the available scripts for running hyper-parameter sweeps. See also the API reference Experiment Scripts for a complete list of scripts.
Script |
Description |
---|---|
Run a PPO experiment with graph isomorphism task. |
|
Do supervised training of a single agent on the graph isomorphism task. |
|
Run a PPO experiment with image classification task. |
|
Do supervised training of a single agent on the image classification task. |
|
Run an experiment with the code validation task using a configuration file. |
Let’s consider the cv_experiment.py script. This script takes the
--config-file
argument, which is a path to a JSON, JSON5, or YAML file. This file
should contain a dictionary with keys “kind” and “parameters”. If “kind” is
“single_experiment”, then “parameters” should be a dictionary with the config values
to use. If “kind” is “grid”, then “parameters” should be a dictionary with keys as
hyperparameter names and values as lists of values to try. The script will run an
experiment for each combination of hyper-parameters in the grid.
The possible config values and their default options are listed in the
CodeValidationExperimentConfig
class, in the cv_experiment.py script. Any config value that is not
specified in the config file will use the default value.
For example, the following JSON file defines a grid will run 4 expert iteration (EI) experiments, running the NIP and Debate protocols with the “introductory” and “interview” level code validation datasets:
scripts/config/cv_experiment/test_difficulty_levels.json
#{
"kind": "grid",
"parameters": {
"trainer": ["pure_text_ei"],
"interaction_protocol": ["nip", "debate"],
"dataset_name": ["lrhammond/buggy-apps"],
"apps_difficulty": ["introductory", "interview"],
"num_iterations": [8],
"rollouts_per_iteration": [200]
}
}
The experiment can now be run by calling the script with the following command:
python scripts/cv_experiment.py --use-wandb --config-file test_difficulty_levels.json trial_1
This will run the experiments sequentially, logging data to Weights & Biases with run
IDs cv_test_difficulty_levels_trial_1_0
, cv_test_difficulty_levels_trial_1_1
,
etc.
See the documentation for the script for more information on how to run it, or run:
python scripts/cv_experiment.py --help