Self-hosting Language Models (nip.language_model_server)

Self-hosting Language Models (nip.language_model_server)#

Overview#

The language model server provides a way to host open-weight language models. It is designed so that interacting with it is similar to using the OpenAI API. The main caveat is that inference and training need to be done with separate services.

The language model server consists of the following components:

  • A vLLM server that provides inference access to a model, with an OpenAI-compatible API.

  • A training service which is accessed using a subset of the OpenAI API.

  • A manager service which starts and stops the vLLM server, allowing for easy switching between models.

The manager service and inference service use the same port, and have the endpoints documented below. The vLLM server runs on its own port, so must be accessed using a different client.

Server API#

POST /vllm/start#

Start the vLLM server with the specified model.

Request JSON Object:
  • model_name (string) – The name of the model to serve with vLLM. Must be a valid model name that vLLM can load.

Response JSON Object:
  • message (string) – A message indicating that the vLLM server has started.

  • model_name (string) – The name of the model that was started.

  • port (int) – The port that the vLLM server is running on.

POST /vllm/stop#

Stop the vLLM server.

Request JSON Object:
  • ignore_not_running (bool) – If True, will not raise an error if the server is not running.

Response JSON Object:
  • message (string) – A message indicating that the vLLM server has stopped.

GET /vllm/status#

Get the status of the vLLM server.

Response JSON Object:
  • status (string) –

    The status of the vLLM server, which can be one of:

    • ”online”: The server is running and accepting connections.

    • ”not_started”: The server has not been started.

    • ”exited”: The server has exited unexpectedly.

    • ”not_accepting_connections”: The server is running but not accepting connections. This can happen if the server is still starting up or if it has crashed.

    • ”server_error”: A 5xx error occurred when trying to connect to the server.

    • ”other_error”: Any other error occurred when trying to connect to the server.

  • error (string) – An error message if the server is not online.

Modules#

server

A server which allows for controlling vLLM and doing language model training.

client

A client for interacting with the self-hosting language model server.

types

Types for the language model server, including request and response structures.

exceptions

Exceptions for the language model server and client.