STACKQUADRANT

Inference Engines

High-performance model inference and serving runtimes

37 repos

llama.cpp

8.0

llama.cpp — a leading open-source project in the AI/LLM ecosystem.

114.4k19.1kC++

vLLM

8.6

vLLM — a leading open-source project in the AI/LLM ecosystem.

81.8k17.6kPython

nomic-ai/gpt4all

7.1

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

77.4k8.3kC++

ray-project/ray

8.6

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

42.8k7.6kPython

gitleaks/gitleaks

8.2

Find secrets with Gitleaks 🔑

27.5k2.1kGo

liguodongiot/llm-action

6.8

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

24.4k2.8kHTML

Lightning-AI/litgpt

7.8

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

13.4k1.4kPython

bentoml/OpenLLM

7.4

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

12.3k811Python

mistralai/mistral-inference

6.9

Official inference library for Mistral models

10.8k1.1kJupyter Notebook

openvinotoolkit/openvino

8.2

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

10.3k3.2kC++

Tiiny-AI/PowerInfer

7.0

High-speed Large Language Model Serving for Local Deployment

9.5k579C++

bentoml/BentoML

8.0

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

8.7k968Python

InternLM/lmdeploy

7.5

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

7.9k701Python

katanemo/plano

7.4

Plano is an AI-native proxy server and data plane for agentic apps - centralizing orchestration, safety, observability, and smart LLM routing so you can deliver agents faster.

6.6k427Rust

algorithmicsuperintelligence/openevolve

6.7

Open-source implementation of AlphaEvolve

6.5k1.0kPython

flashinfer-ai/flashinfer

7.4

FlashInfer: Kernel Library for LLM Serving

5.7k1.0kPython

kserve/kserve

7.7

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

5.5k1.5kGo

Michael-A-Kuykendall/shimmy

6.3

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

5.3k503Rust

xlite-dev/Awesome-LLM-Inference

6.5

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

5.3k381Python

gpustack/gpustack

7.0

Performance-optimized AI inference on your GPUs. Unlock superior throughput by selecting and tuning engines like vLLM or SGLang.

5.1k540Python

FellouAI/eko

7.0

Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai

4.9k439TypeScript

lemonade-sdk/lemonade

7.1

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

4.2k330C++

ruvnet/ruvector

7.0

RuVector is a High Performance, Real-Time, Self-Learning, Vector Graph Neural Network, and Database built in Rust.

4.2k544Rust

ruvnet/RuVector

7.0

RuVector is a High Performance, Real-Time, Self-Learning, Vector Graph Neural Network, and Database built in Rust.

4.2k544Rust