nip.language_model_server.types.LmDpoTrainingConfig#
- class nip.language_model_server.types.LmDpoTrainingConfig(*, beta: float, learning_rate: float, max_prompt_length: int | None = None, max_completion_length: int | None = None, max_length: int | None = None)[source]#
Configuration for Direct Preference Optimization (DPO) training.
Attributes
__fields_set__
model_computed_fields
model_config
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_extra
Get extra fields set during validation.
model_fields
model_fields_set
Returns the set of fields that have been explicitly set on this model instance.
beta
The beta parameter controlling trade-off between exploration and exploitation.
learning_rate
The learning rate for the DPO training.
max_prompt_length
The maximum length of the prompt sequence.
max_completion_length
The maximum length of the completion sequence.
max_length
The maximum length full sequence (prompt + completion).
Methods