nip.parameters.trainers.TextRlParameters#
- class nip.parameters.trainers.TextRlParameters(fine_tune_on_all_previous_rollouts: bool = False, verifier_guess_replacement_proportion: float = 0.0, verifier_guess_replacement_annealing: ~typing.Literal['none', 'linear', 'exponential'] = 'none', verifier_guess_replacement_annealing_rate: float = 0.1, save_transcripts: bool = True, transcript_format: ~typing.Literal['json', 'yaml'] = 'yaml', test_scheme: ~typing.Annotated[~typing.Literal['none', 'all', 'last', 'first_and_last'], <nip.parameters.base_run.BaseRunPreserve object at 0x7f21b478c810>] = 'none', test_on_whole_dataset: ~typing.Annotated[bool, <nip.parameters.base_run.BaseRunPreserve object at 0x7f21b478c850>] = True)[source]#
Additional parameters for the text-based RL trainers.
- Parameters:
fine_tune_on_all_previous_rollouts (bool) – Whether to fine-tune the agents on the rollouts from all iterations so far. If
False
, only the rollouts from the current iteration are used.verifier_guess_replacement_proportion (float) – When fine-tuning on the rollouts, replace the verifier’s guess with the true label for this proportion of the rollouts. This only changes the last message of the verifier, and leaves the rest of the transcript unchanged.
verifier_guess_replacement_annealing (Literal["none", "linear", "exponential"]) –
The annealing schedule for the proportion of rollouts where the verifier’s guess is replaced. Possible values are:
”none”: No annealing.
”linear”: Linear annealing with rate
verifier_guess_replacement_annealing_rate
.”exponential”: Exponential annealing with base
1-verifier_guess_replacement_annealing_rate
.
verifier_guess_replacement_annealing_rate (float) – The rate of annealing for the proportion of rollouts where the verifier’s guess is replaced.
save_transcripts (bool) – Whether to save the transcripts of the rollouts. Note that the raw rollouts are always saved, and the transcripts can be extracted from them. So this is mostly for convenience (and comes with a small processing overhead).
transcript_format (Literal["json", "yaml"]) – The format to save the transcripts in.
test_scheme (TestSchemeType) – When to run the test loop during training. See
TestSchemeType
for options.test_on_whole_dataset (bool) – Whether to run the test loop on the whole dataset or only on a single iteration-worth of rollouts.
test_every_iteration (bool) – Whether to run the test loop after every iteration. If
False
, the test loop is only run after training is complete.
Methods Summary
__eq__
(other)Return self==value.
__init__
([...])__repr__
()Return repr(self).
_get_param_class_from_dict
(param_dict)Try to get the parameter class from a dictionary of serialised parameters.
Construct a set of basic parameters for testing.
from_dict
(params_dict[, ignore_extra_keys])Create a parameters object from a dictionary.
get
(address)Get a value from the parameters object using a dot-separated address.
to_dict
()Convert the parameters object to a dictionary.
Attributes
fine_tune_on_all_previous_rollouts
save_transcripts
test_on_whole_dataset
test_scheme
transcript_format
verifier_guess_replacement_annealing
verifier_guess_replacement_annealing_rate
verifier_guess_replacement_proportion
Methods
- __eq__(other)#
Return self==value.
- __init__(fine_tune_on_all_previous_rollouts: bool = False, verifier_guess_replacement_proportion: float = 0.0, verifier_guess_replacement_annealing: ~typing.Literal['none', 'linear', 'exponential'] = 'none', verifier_guess_replacement_annealing_rate: float = 0.1, save_transcripts: bool = True, transcript_format: ~typing.Literal['json', 'yaml'] = 'yaml', test_scheme: ~typing.Annotated[~typing.Literal['none', 'all', 'last', 'first_and_last'], <nip.parameters.base_run.BaseRunPreserve object at 0x7f21b478c810>] = 'none', test_on_whole_dataset: ~typing.Annotated[bool, <nip.parameters.base_run.BaseRunPreserve object at 0x7f21b478c850>] = True) None #
- __repr__()#
Return repr(self).
- classmethod _get_param_class_from_dict(param_dict: dict) type[ParameterValue] | None [source]#
Try to get the parameter class from a dictionary of serialised parameters.
- Parameters:
param_dict (dict) – A dictionary of parameters, which may have come from a
to_dict
method. This dictionary may contain a_type
key, which is used to determine the class of the parameter.- Returns:
param_class (type[ParameterValue] | None) – The class of the parameter, if it can be determined.
- Raises:
ValueError – If the class specified in the dictionary is not a valid parameter class.
- classmethod construct_test_params() BaseHyperParameters [source]#
Construct a set of basic parameters for testing.
- classmethod from_dict(params_dict: dict, ignore_extra_keys: bool = False) BaseHyperParameters [source]#
Create a parameters object from a dictionary.