nip.language_model_server.types.LmDpoTrainingConfig#
- class nip.language_model_server.types.LmDpoTrainingConfig(*, beta: float, learning_rate: float, num_train_epochs: int, max_prompt_length: int | None = None, max_completion_length: int | None = None, max_length: int | None = None)[source]#
- Configuration for Direct Preference Optimization (DPO) training. - Attributes - __fields_set__- model_computed_fields- model_config- Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. - model_extra- Get extra fields set during validation. - model_fields- model_fields_set- Returns the set of fields that have been explicitly set on this model instance. - beta- The beta parameter controlling trade-off between exploration and exploitation. - learning_rate- The learning rate for the DPO training. - num_train_epochs- The number of epochs to train the model for. - max_prompt_length- The maximum length of the prompt sequence. - max_completion_length- The maximum length of the completion sequence. - max_length- The maximum length full sequence (prompt + completion). - Methods