Inference Engines
High-performance model inference and serving runtimes
llama.cpp
8.0
★ 114.4k◇ 19.1kC++
vLLM
8.6
★ 81.8k◇ 17.6kPython
nomic-ai/gpt4all
7.1
★ 77.4k◇ 8.3kC++
ray-project/ray
8.6
★ 42.8k◇ 7.6kPython
gitleaks/gitleaks
8.2
★ 27.5k◇ 2.1kGo
liguodongiot/llm-action
6.8
★ 24.4k◇ 2.8kHTML
Lightning-AI/litgpt
7.8
★ 13.4k◇ 1.4kPython
bentoml/OpenLLM
7.4
★ 12.3k◇ 811Python
mistralai/mistral-inference
6.9
★ 10.8k◇ 1.1kJupyter Notebook
openvinotoolkit/openvino
8.2
★ 10.3k◇ 3.2kC++
Tiiny-AI/PowerInfer
7.0
★ 9.5k◇ 579C++
bentoml/BentoML
8.0
★ 8.7k◇ 968Python
InternLM/lmdeploy
7.5
★ 7.9k◇ 701Python
katanemo/plano
7.4
★ 6.6k◇ 427Rust
algorithmicsuperintelligence/openevolve
6.7
★ 6.5k◇ 1.0kPython
flashinfer-ai/flashinfer
7.4
★ 5.7k◇ 1.0kPython
kserve/kserve
7.7
★ 5.5k◇ 1.5kGo
Michael-A-Kuykendall/shimmy
6.3
★ 5.3k◇ 503Rust
xlite-dev/Awesome-LLM-Inference
6.5
★ 5.3k◇ 381Python
gpustack/gpustack
7.0
★ 5.1k◇ 540Python
FellouAI/eko
7.0
★ 4.9k◇ 439TypeScript
lemonade-sdk/lemonade
7.1
★ 4.2k◇ 330C++
ruvnet/ruvector
7.0
★ 4.2k◇ 544Rust
ruvnet/RuVector
7.0
★ 4.2k◇ 544Rust
1 / 2next →