Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: vLLM backend #2010

Draft
wants to merge 83 commits into
base: dev
Choose a base branch
from
Draft

feat: vLLM backend #2010

wants to merge 83 commits into from

Conversation

gau-nernst
Copy link
Contributor

@gau-nernst gau-nernst commented Feb 21, 2025

Describe Your Changes

cortex engines install vllm

  • Download uv to cortexcpp/python_engines/bin/uv if uv is not installed
  • (via uv) Setup venv at cortexcpp/python_engines/envs/vllm/<version>/.venv
  • (via uv) Download vllm and its deps
  • Known issues:
    • Progress streaming is not supported (since download is done via uv instead of DownloadService).
    • It's not async since we need to wait for subprocess to finish (perhaps we will need a new SubprocessService in the future which handles async WaitProcess())
    • Hence, stopping and resuming download also does not work.

Note:

  • All cached Python packages are stored in cortexcpp/python_engines/cache/uv. The purpose is that when we remove python_engines folder, we are sure that we don't leave anything behind.

cortex models start <model>

  • Spawn vllm serve

TODO:

  • cortex engines install vllm
  • Set default engine variant
  • cortex engines load vllm
  • cortex engines list
  • cortex engines uninstall vllm: delete cortexcpp/python_engines/envs/vllm/<version>
  • cortex pull <model>
  • cortex models list
  • cortex models start <model>: spawn vllm serve
  • cortex models stop <model>
  • cortex ps
  • Chat completion
    • Non-streaming
    • Streaming
  • cortex run

Fixes Issues

Self Checklist

  • Added relevant comments, esp in complex areas
  • Updated docs (for bug fixes / features)
  • Created issues for follow-up changes or refactoring needed

@ramonpzg ramonpzg added this to the Caffeinated Sloth milestone Mar 13, 2025
@gau-nernst gau-nernst changed the title feat: Python engine improvements feat: vLLM backend Mar 17, 2025
@gau-nernst gau-nernst mentioned this pull request Mar 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

vLLM backend for Cortex
2 participants