nip.language_model_server.types.LmDpoTrainingConfig

nip.language_model_server.types.LmDpoTrainingConfig#

class nip.language_model_server.types.LmDpoTrainingConfig(*, beta: float, learning_rate: float, max_prompt_length: int | None = None, max_completion_length: int | None = None, max_length: int | None = None)[source]#

Configuration for Direct Preference Optimization (DPO) training.

Attributes

__fields_set__

model_computed_fields

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_extra

Get extra fields set during validation.

model_fields

model_fields_set

Returns the set of fields that have been explicitly set on this model instance.

beta

The beta parameter controlling trade-off between exploration and exploitation.

learning_rate

The learning rate for the DPO training.

max_prompt_length

The maximum length of the prompt sequence.

max_completion_length

The maximum length of the completion sequence.

max_length

The maximum length full sequence (prompt + completion).

Methods