run_lm_server.py

run_lm_server.py#

Run the self-hosting language model server.

This server controls a vLLM server for language model inference and provides an Open-AI compatible API for training.

scripts/run_lm_server.py#

Run the self-hosting language model server.

usage: scripts/run_lm_server.py [-h] [--lm-server-port LM_SERVER_PORT]
                                [--vllm-port VLLM_PORT]
                                [--max-training-jobs MAX_TRAINING_JOBS]
                                [--vllm-num-gpus VLLM_NUM_GPUS]
                                [--accelerate-config ACCELERATE_CONFIG]
                                [--log-to-file] [--external] [--dev]
-h, --help#

show this help message and exit

--lm-server-port <lm_server_port>#

The port on which the main language model server will run.

--vllm-port <vllm_port>#

The port on which the vLLM server will run.

--max-training-jobs <max_training_jobs>#

The maximum number of concurrent training jobs allowed.

--vllm-num-gpus <vllm_num_gpus>#

The number of GPUs to use for the vLLM server. If set to ‘auto’, it will use all available GPUs.

--accelerate-config <accelerate_config>#

Path to the configuration file for the accelerate library. Can be either a regular file or a Jinja2 template. If empty, no configuration file will be passed to the accelerate command.

--log-to-file#

Whether to log vLLM and trainer to files instead of stdout and stderr.

--external#

Whether to run the server in external mode, allowing it to be accessed from outside. Otherwise, it will only be accessible from localhost.

--dev#

Whether to run the FastAPI server in development mode, which enables auto-reload.

This server controls a vLLM server for language model inference and provides an Open-AI compatible API for training.