run_lm_server.py#
Run the self-hosting language model server.
This server controls a vLLM server for language model inference and provides an Open-AI compatible API for training.
scripts/run_lm_server.py#
Run the self-hosting language model server.
usage: scripts/run_lm_server.py [-h] [--lm-server-port LM_SERVER_PORT]
[--vllm-port VLLM_PORT]
[--max-training-jobs MAX_TRAINING_JOBS]
[--vllm-num-gpus VLLM_NUM_GPUS]
[--accelerate-config ACCELERATE_CONFIG]
[--log-to-file] [--external] [--dev]
- -h, --help#
show this help message and exit
- --lm-server-port <lm_server_port>#
The port on which the main language model server will run.
- --vllm-port <vllm_port>#
The port on which the vLLM server will run.
- --max-training-jobs <max_training_jobs>#
The maximum number of concurrent training jobs allowed.
- --vllm-num-gpus <vllm_num_gpus>#
The number of GPUs to use for the vLLM server. If set to ‘auto’, it will use all available GPUs.
- --accelerate-config <accelerate_config>#
Path to the configuration file for the accelerate library. Can be either a regular file or a Jinja2 template. If empty, no configuration file will be passed to the accelerate command.
- --log-to-file#
Whether to log vLLM and trainer to files instead of stdout and stderr.
- --external#
Whether to run the server in external mode, allowing it to be accessed from outside. Otherwise, it will only be accessible from localhost.
- --dev#
Whether to run the FastAPI server in development mode, which enables auto-reload.
This server controls a vLLM server for language model inference and provides an Open-AI compatible API for training.