[Feature] Inferencing using multiple backends #622

snehargho · 2025-03-06T18:45:25Z

Is there a plan to implement inferencing using multiple backends like llama.cpp? As in offloading a number of layers to GPU to control vram usage, gpu power draw, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Inferencing using multiple backends #622

[Feature] Inferencing using multiple backends #622

snehargho commented Mar 6, 2025

[Feature] Inferencing using multiple backends #622

[Feature] Inferencing using multiple backends #622

Comments

snehargho commented Mar 6, 2025