nip.language_model_server.types.LmDpoTrainingConfig#

class nip.language_model_server.types.LmDpoTrainingConfig(*, beta: float, learning_rate: float, num_train_epochs: int, max_prompt_length: int | None = None, max_completion_length: int | None = None, max_length: int | None = None)[source]#

Configuration for Direct Preference Optimization (DPO) training.

Attributes

`__fields_set__`
`model_computed_fields`
`model_config`	Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
`model_extra`	Get extra fields set during validation.
`model_fields`
`model_fields_set`	Returns the set of fields that have been explicitly set on this model instance.
`beta`	The beta parameter controlling trade-off between exploration and exploitation.
`learning_rate`	The learning rate for the DPO training.
`num_train_epochs`	The number of epochs to train the model for.
`max_prompt_length`	The maximum length of the prompt sequence.
`max_completion_length`	The maximum length of the completion sequence.
`max_length`	The maximum length full sequence (prompt + completion).

Methods

nip.language_model_server.types.LmDpoTrainingConfig

Contents

nip.language_model_server.types.LmDpoTrainingConfig#