nip.language_model_server.client.LanguageModelClient#

class nip.language_model_server.client.LanguageModelClient(server_url: str = 'http://localhost:5000')[source]#

A client for interacting with the language model server.

This client provides methods to interact with the language model server, allowing for controlling the vLLM server and performing language model training tasks.

Parameters:

server_url (str, default=”http://localhost:5000”) – The URL of the language model server. This should include the protocol (http or https) and the port number if applicable.

Methods Summary

__init__([server_url])

cancel_training_job(job_id)

Cancel a training job by its ID.

check_server_version()

Check the server version against the client version.

create_training_job(training_config, dataset)

Create a new training job with the specified configuration.

get_server_version()

Get the version of the language model server.

get_training_job(job_id)

Get the details of a specific training job by its ID.

get_training_jobs()

Get the list of training jobs currently managed by the server.

get_vllm_server_status()

Get the current status of the vLLM language model server.

lm_server_accepting_connections()

Check if the language model server is accepting connections.

start_vllm_server(model_name[, quantization])

Start the vLLM language model server with the specified model.

stop_vllm_server([ignore_not_running, timeout])

Stop the vLLM language model server.

validate_server_version()

Validate the server version against the client version.

wait_for_lm_server_to_accept_connections([...])

Wait for the language model server to start accepting connections.

wait_for_vllm_server([timeout])

Wait for the vLLM server to be online.

Methods

__init__(server_url: str = 'http://localhost:5000')[source]#
async cancel_training_job(job_id: str)[source]#

Cancel a training job by its ID.

Parameters:

job_id (str) – The ID of the training job to cancel.

Raises:

HTTPStatusError – If the server returns an error status code while cancelling the training job.

async check_server_version() Literal['ok', 'major', 'minor', 'patch'][source]#

Check the server version against the client version.

Returns:

status (str) – A string indicating the status of the server version:

  • ”ok” if the server version matches the client version.

  • ”major” if the server version differs by a major version.

  • ”minor” if the server version differs by a minor version.

  • ”patch” if the server version differs by a patch version.

Raises:

BadResponseError – If the server returns an invalid response or if the response does not contain the expected ‘version’ field.

async create_training_job(training_config: LmTrainingConfig, dataset: list[DpoDatasetItem], job_name: str | None = None) TrainingJobInfo[source]#

Create a new training job with the specified configuration.

Parameters:
  • training_config (LmTrainingConfig) – The configuration for the training job, including model name and training parameters.

  • dataset (list[DpoDatasetItem]) – The dataset to be used for training. This should be a list of dictionaries where each dictionary represents a single data point in the dataset.

  • job_name (Optional[str], default=None) – An optional name for the job, to make it more recognizable.

Returns:

training_job (TrainingJobInfo) – An object containing the details of the created training job, including its ID, status, and configuration.

Raises:
  • HTTPStatusError – If the server returns an error status code while creating the training job.

  • BadResponseError – If the server returns an invalid response or if the response does not contain the expected data.

async get_server_version() str[source]#

Get the version of the language model server.

Returns:

version (str) – The version of the language model server, as a string.

Raises:

BadResponseError – If the server returns an invalid response or if the response does not contain the expected ‘version’ field.

async get_training_job(job_id: str) TrainingJobInfo[source]#

Get the details of a specific training job by its ID.

Parameters:

job_id (str) – The ID of the training job to retrieve.

Returns:

training_job (TrainingJobInfo) – An object containing the details of the training job, including its ID, status, and configuration.

Raises:
  • TrainingJobNotFoundClientError – If the training job with the specified ID does not exist on the server.

  • HTTPStatusError – If the server returns an error status code while creating the training job.

  • BadResponseError – If the server returns an invalid response or if the response does not contain the expected data.

async get_training_jobs() list[TrainingJobInfo][source]#

Get the list of training jobs currently managed by the server.

Returns:

training_jobs (list[TrainingJobInfo]) – A list of TrainingJobInfo objects, each containing information about a training job, including its ID, status, and configuration.

Raises:
  • HTTPStatusError – If the server returns an error status code while creating the training job.

  • BadResponseError – If the server returns an invalid response or if the response does not contain the expected data.

async get_vllm_server_status() Literal['online', 'not_started', 'crashed', 'not_accepting_connections', 'timeout', 'server_error', 'other_error'][source]#

Get the current status of the vLLM language model server.

Returns:

vllm_server_status (ServerStatus) – The current status of the vLLM server. See the documentation for ServerStatus for possible values.

Raises:

BadResponseError – If the server returns an invalid response or if the response does not contain the expected ‘status’ field, or if the status is not a valid ServerStatus.

async lm_server_accepting_connections() bool[source]#

Check if the language model server is accepting connections.

This method will attempt to make a request to the server’s version endpoint. If the server is online and responds successfully, it returns True. If the server is not online or if there is a connection error, it returns False.

Returns:

accepting_connections (bool) – True if the language model server is accepting connections, False otherwise.

async start_vllm_server(model_name: str, quantization: Literal['bitsandbytes', 'none'] = 'none') str[source]#

Start the vLLM language model server with the specified model.

Parameters:
  • model_name (str) – The name of the model to be served by vLLM. This should match a model that is available in the vLLM installation.

  • quantization (VllmQuantization, default="no") – The quantization method to use for the model.

Returns:

success_message (str) – A message indicating that the vLLM server has been started successfully, or was already running.

Raises:

BadResponseError – If the server returns an invalid response or if the response does not contain the expected data.

async stop_vllm_server(ignore_not_running: bool = False, timeout: float = 15.0)[source]#

Stop the vLLM language model server.

Parameters:
  • ignore_not_running (bool, default=False) – If True, the server will not raise an error if it is not running. Instead, it will log a warning and return a success message indicating that the server was not running and is being ignored.

  • timeout (float, default=15.0) – The maximum time to wait for the vLLM server to stop, in seconds. If the server does not stop within this time, a timeout error will be raised. The server will attempt to terminate gracefully for max(timeout - 5.0, 1.0) seconds, after which it will be forcefully killed if it is still running.

Raises:

HTTPStatusError – If the server returns an error status code while stopping the vLLM server.

async validate_server_version()[source]#

Validate the server version against the client version.

This method checks if the server version matches the client version. If they differ by a major version, it raises a RuntimeError. If they differ by a minor or patch version, it issues a warning.

Raises:
  • RuntimeError – If the server version differs from the client version by a major version.

  • UserWarning – If the server version differs from the client version by a minor or patch version.

async wait_for_lm_server_to_accept_connections(timeout: float = 300)[source]#

Wait for the language model server to start accepting connections.

This method will repeatedly check if the server is online by making a request to the server’s version endpoint. If the server is not online, it will raise a ClientTimeoutError after the specified timeout period.

Parameters:

timeout (float, default=300) – The maximum time to wait for the language model server to start accepting connections, in seconds.

Raises:

ClientTimeoutError – If the language model server does not become online within the specified timeout.

async wait_for_vllm_server(timeout: float = 900)[source]#

Wait for the vLLM server to be online.

Parameters:

timeout (float, default=900) – The maximum time to wait for the vLLM server to be online, in seconds.

Raises:

ClientTimeoutError – If the vLLM server does not become online within the specified timeout.