nip.language_model_server.types.LmDpoTrainingConfig#
- class nip.language_model_server.types.LmDpoTrainingConfig(*, beta: float, learning_rate: float, num_train_epochs: int, max_prompt_length: int | None = None, max_completion_length: int | None = None, max_length: int | None = None)[source]#
Configuration for Direct Preference Optimization (DPO) training.
Attributes
__fields_set__model_computed_fieldsmodel_configConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_extraGet extra fields set during validation.
model_fieldsmodel_fields_setReturns the set of fields that have been explicitly set on this model instance.
betaThe beta parameter controlling trade-off between exploration and exploitation.
learning_rateThe learning rate for the DPO training.
num_train_epochsThe number of epochs to train the model for.
max_prompt_lengthThe maximum length of the prompt sequence.
max_completion_lengthThe maximum length of the completion sequence.
max_lengthThe maximum length full sequence (prompt + completion).
Methods