STACKQUADRANT

llama.cpp

ggml-org/llama.cpp
8.0

llama.cpp — a leading open-source project in the AI/LLM ecosystem.

Inference Engines
114.4k19.1kC++MITtoday

vLLM

vllm-project/vllm
8.6

vLLM — a leading open-source project in the AI/LLM ecosystem.

Inference Engines
81.8k17.6kPythonApache-2.0today

gpt4all

nomic-ai/gpt4all
7.1

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

Inference Engines
77.4k8.3kC++MIT1y ago

ray

ray-project/ray
8.6

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Inference Engines
42.8k7.6kPythonApache-2.0today

gitleaks

gitleaks/gitleaks
8.2

Find secrets with Gitleaks 🔑

Inference Engines
27.5k2.1kGoMIT1d ago

llm-action

liguodongiot/llm-action
6.8

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

Inference Engines
24.4k2.8kHTMLApache-2.09d ago

litgpt

Lightning-AI/litgpt
7.8

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Inference Engines
13.4k1.4kPythonApache-2.01d ago

OpenLLM

bentoml/OpenLLM
7.4

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Inference Engines
12.3k811PythonApache-2.01d ago

mistral-inference

mistralai/mistral-inference
6.9

Official inference library for Mistral models

Inference Engines
10.8k1.1kJupyter NotebookApache-2.01mo ago

openvino

openvinotoolkit/openvino
8.2

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

Inference Engines
10.3k3.2kC++Apache-2.0today

PowerInfer

Tiiny-AI/PowerInfer
7.0

High-speed Large Language Model Serving for Local Deployment

Inference Engines
9.5k579C++MIT23d ago

BentoML

bentoml/BentoML
8.0

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Inference Engines
8.7k968PythonApache-2.0today

lmdeploy

InternLM/lmdeploy
7.5

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Inference Engines
7.9k701PythonApache-2.0today

plano

katanemo/plano
7.4

Plano is an AI-native proxy server and data plane for agentic apps - centralizing orchestration, safety, observability, and smart LLM routing so you can deliver agents faster.

Inference Engines
6.6k427RustApache-2.01d ago

openevolve

algorithmicsuperintelligence/openevolve
6.7

Open-source implementation of AlphaEvolve

Inference Engines
6.5k1.0kPythonApache-2.02mo ago

flashinfer

flashinfer-ai/flashinfer
7.4

FlashInfer: Kernel Library for LLM Serving

Inference Engines
5.7k1.0kPythonApache-2.0today

kserve

kserve/kserve
7.7

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Inference Engines
5.5k1.5kGoApache-2.0today

shimmy

Michael-A-Kuykendall/shimmy
6.3

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

Inference Engines
5.3k503RustApache-2.01d ago

Awesome-LLM-Inference

xlite-dev/Awesome-LLM-Inference
6.5

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Inference Engines
5.3k381PythonGPL-3.01mo ago

gpustack

gpustack/gpustack
7.0

Performance-optimized AI inference on your GPUs. Unlock superior throughput by selecting and tuning engines like vLLM or SGLang.

Inference Engines
5.1k540PythonApache-2.0today

eko

FellouAI/eko
7.0

Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai

Inference Engines
4.9k439TypeScriptMIT3mo ago

lemonade

lemonade-sdk/lemonade
7.1

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

Inference Engines
4.2k330C++Apache-2.0today

ruvector

ruvnet/ruvector
7.0

RuVector is a High Performance, Real-Time, Self-Learning, Vector Graph Neural Network, and Database built in Rust.

Inference Engines
4.2k544RustMITtoday

RuVector

ruvnet/RuVector
7.0

RuVector is a High Performance, Real-Time, Self-Learning, Vector Graph Neural Network, and Database built in Rust.

Inference Engines
4.2k544RustMITtoday

optillm

algorithmicsuperintelligence/optillm
6.6

Optimizing inference proxy for LLMs

Inference Engines
4.1k355PythonApache-2.027d ago

lorax

predibase/lorax
6.9

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Inference Engines
3.8k316PythonApache-2.05d ago

deepsparse

neuralmagic/deepsparse
5.9

Sparsity-aware deep learning inference runtime for CPUs

Inference Engines
3.2k192PythonNOASSERTION1y ago

spiceai

spiceai/spiceai
7.0

A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.

Inference Engines
2.9k197RustApache-2.0today

distributed-llama

b4rtaz/distributed-llama
6.2

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

Inference Engines
2.9k232C++MIT1mo ago

Medusa

FasterDecoding/Medusa
5.4

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Inference Engines
2.7k201Jupyter NotebookApache-2.01y ago

kvcached

ovg-project/kvcached
5.8

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Inference Engines
1.1k118PythonApache-2.0today

nobodywho

nobodywho-ooo/nobodywho
6.2

NobodyWho is an inference engine that lets you run LLMs locally and efficiently on any device.

Inference Engines
94466RustEUPL-1.2today

ZhiLight

zhihu/ZhiLight
5.3

A highly optimized LLM inference acceleration engine for Llama and its variants.

Inference Engines
905102C++Apache-2.02mo ago

mlxstudio

jjang-ai/mlxstudio
5.3

MLX Studio - Home of JANG_Q - Image Gen/Edit + Chat/Code All in one - + OpenClaw (Anthropic API)

Inference Engines
76349today

yalm

andrewkchan/yalm
3.7

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

Inference Engines
58462C++8mo ago

KuiperLLama

zjhellofss/KuiperLLama
4.0

校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

Inference Engines
548142C++7mo ago

swiftLLM

interestingLSY/swiftLLM
3.8

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Inference Engines
32938PythonApache-2.011mo ago