feat: vLLM backend #2010

gau-nernst · 2025-02-21T01:01:25Z

Describe Your Changes

`cortex engines install vllm`

Download uv to cortexcpp/python_engines/bin/uv if uv is not installed
(via uv) Setup venv at cortexcpp/python_engines/envs/vllm/<version>/.venv
(via uv) Download vllm and its deps
Known issues:
- Progress streaming is not supported (since download is done via uv instead of DownloadService).
- It's not async since we need to wait for subprocess to finish (perhaps we will need a new SubprocessService in the future which handles async WaitProcess())
- Hence, stopping and resuming download also does not work.

Note:

All cached Python packages are stored in cortexcpp/python_engines/cache/uv. The purpose is that when we remove python_engines folder, we are sure that we don't leave anything behind.

`cortex models start <model>`

Spawn vllm serve

TODO:

Fixes Issues

Closes vLLM backend for Cortex #1890

Self Checklist

Added relevant comments, esp in complex areas
Updated docs (for bug fixes / features)
Created issues for follow-up changes or refactoring needed

…e to launch model (non-functional atm)

gau-nernst added 30 commits February 14, 2025 09:14

wip: download uv

60b13bb

Merge branch 'dev' into thien/python_engine

3ddce8c

fix: has_value -> has_error

f9817c8

move uv stuff to python_engine. use uv to start process

2dbc296

redirect stdout/stderr

eec24bd

simplify code

26fdbd3

rename python engine interface

3ba7994

use PythonEngineI

5e7125f

more checks to match all EngineV variants

c5da0ee

improve Python load model

3c097fb

consolidate process-related functions

84db8b0

update PythonModelConfig. add UnloadModel

8ee815c

implement PythonEngine::GetModels

29f5344

Merge branch 'dev' into thien/python_engine

75ce355

implement getModelStatus. add some notes

7949dcc

add router for python

e2f0323

call PythonEngine destructor

607d2cb

remove unused method

f58b773

remove unnecessary headers

bf23c9f

Merge branch 'dev' into thien/python_engine

d7818d5

remove unused stuff

8ebee7c

download uv directly from github release

8f36adc

check for entrypoint

5ebfbb7

only record model size for llama.cpp

5d310d1

don't include headers

c4c622c

Merge branch 'dev' into thien/python_engine

fc0369c

don't use std::optional to support < c++17

6b59878

fix stringstream usage

250a2ac

define pid_t for windows

bb38a56

explicit call .string() on filesystem::path to support windows

723c5db

ramonpzg added this to the Caffeinated Sloth milestone Mar 13, 2025

gau-nernst added 2 commits March 17, 2025 12:00

remove checks against supportedEngines

49df6af

remove supportedEngines check for more commands

f1dcdde

gau-nernst mentioned this pull request Mar 17, 2025

chore: Remove supported_engines check in CLI #2129

Merged

3 tasks

gau-nernst added 4 commits March 17, 2025 12:21

Merge branch 'dev' into thien/cli_engines_install

1a7576b

Merge branch 'thien/cli_engines_install' into thien/python_engine

64124b3

Merge branch 'dev' into thien/python_engine

f030615

init vllm engine

13652ca

gau-nernst changed the title ~~feat: Python engine improvements~~ feat: vLLM backend Mar 17, 2025

gau-nernst added 20 commits March 17, 2025 15:26

fix issues with progress streaming

4d13014

Merge branch 'dev' into thien/python_engine

5c451d8

support download HF model

591d461

use / for HF model

c3d41bf

fix thread-unsafe

dc42ddd

Merge branch 'dev' into thien/python_engine

13d9e3f

Merge branch 'dev' into thien/python_engine

70151e2

remove methods

73fe3e5

remove old remnants

7bf287d

support models list. add --relocatable for venv

2a2b607

preparation works for start model

fffc686

add sync download util. add vLLM version config. some boilerplate cod…

cea8020

…e to launch model (non-functional atm)

list engines

86d4c01

load and unload model

ec8b36d

retrieve cortex port from yaml file

9226110

add env vars support. log stdout and stderr

eeccd3a

add GetModelStatus and GetModels

6fe7ae8

fix typo

074a04a

Merge branch 'dev' into thien/python_engine

cd55d64

add non-stream chat completions

368a4f3

gau-nernst mentioned this pull request Mar 22, 2025

idea: Apple MLX #678

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: vLLM backend #2010

feat: vLLM backend #2010

gau-nernst commented Feb 21, 2025 •

edited

Loading

feat: vLLM backend #2010

Are you sure you want to change the base?

feat: vLLM backend #2010

Conversation

gau-nernst commented Feb 21, 2025 • edited Loading

Describe Your Changes

cortex engines install vllm

cortex models start <model>

Fixes Issues

Self Checklist

gau-nernst commented Feb 21, 2025 •

edited

Loading

`cortex engines install vllm`

`cortex models start <model>`