Inference Engines
High-performance model inference and serving runtimes
llama.cpp
8.0
★ 103.7k◇ 16.9kC++
nomic-ai/gpt4all
7.2
★ 77.3k◇ 8.3kC++
vLLM
8.6
★ 76.6k◇ 15.6kPython
ray-project/ray
8.6
★ 42.1k◇ 7.4kPython
gitleaks/gitleaks
8.2
★ 26.0k◇ 2.0kGo
liguodongiot/llm-action
6.7
★ 24.0k◇ 2.8kHTML
Lightning-AI/litgpt
7.9
★ 13.3k◇ 1.4kPython
bentoml/OpenLLM
7.4
★ 12.3k◇ 805Python
mistralai/mistral-inference
6.9
★ 10.8k◇ 1.0kJupyter Notebook
openvinotoolkit/openvino
8.2
★ 10.1k◇ 3.2kC++
Tiiny-AI/PowerInfer
6.8
★ 9.3k◇ 561C++
bentoml/BentoML
8.0
★ 8.6k◇ 950Python
InternLM/lmdeploy
7.5
★ 7.8k◇ 684Python
katanemo/plano
7.4
★ 6.3k◇ 399Rust
algorithmicsuperintelligence/openevolve
6.8
★ 6.0k◇ 950Python
flashinfer-ai/flashinfer
7.5
★ 5.4k◇ 896Python
kserve/kserve
7.7
★ 5.3k◇ 1.4kGo
xlite-dev/Awesome-LLM-Inference
6.6
★ 5.1k◇ 360Python
FellouAI/eko
7.3
★ 4.9k◇ 436TypeScript
gpustack/gpustack
7.0
★ 4.8k◇ 497Python
Michael-A-Kuykendall/shimmy
6.2
★ 4.0k◇ 343Rust
ruvnet/RuVector
6.7
★ 3.8k◇ 464Rust
ruvnet/ruvector
6.7
★ 3.8k◇ 464Rust
predibase/lorax
6.1
★ 3.7k◇ 312Python
1 / 2next →