STACKQUADRANT

llama.cpp

ggml-org/llama.cpp
8.0

llama.cpp — a leading open-source project in the AI/LLM ecosystem.

Inference Engines
103.7k16.8kC++MITtoday

gpt4all

nomic-ai/gpt4all
7.2

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

Inference Engines
77.3k8.3kC++MIT10mo ago

vLLM

vllm-project/vllm
8.6

vLLM — a leading open-source project in the AI/LLM ecosystem.

Inference Engines
76.6k15.6kPythonApache-2.0today

ray

ray-project/ray
8.6

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Inference Engines
42.1k7.4kPythonApache-2.0today

gitleaks

gitleaks/gitleaks
8.2

Find secrets with Gitleaks 🔑

Inference Engines
25.9k2.0kGoMIT20d ago

llm-action

liguodongiot/llm-action
6.7

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

Inference Engines
24.0k2.8kHTMLApache-2.01mo ago

litgpt

Lightning-AI/litgpt
7.9

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Inference Engines
13.3k1.4kPythonApache-2.04d ago

OpenLLM

bentoml/OpenLLM
7.4

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Inference Engines
12.3k805PythonApache-2.01d ago

mistral-inference

mistralai/mistral-inference
6.9

Official inference library for Mistral models

Inference Engines
10.8k1.0kJupyter NotebookApache-2.01mo ago

openvino

openvinotoolkit/openvino
8.2

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

Inference Engines
10.1k3.2kC++Apache-2.0today

PowerInfer

Tiiny-AI/PowerInfer
6.8

High-speed Large Language Model Serving for Local Deployment

Inference Engines
9.3k561C++MIT2mo ago

BentoML

bentoml/BentoML
8.0

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Inference Engines
8.6k950PythonApache-2.01d ago

lmdeploy

InternLM/lmdeploy
7.5

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Inference Engines
7.8k684PythonApache-2.0today

plano

katanemo/plano
7.4

Plano is an AI-native proxy server and data plane for agentic apps - centralizing orchestration, safety, observability, and smart LLM routing so you can deliver agents faster.

Inference Engines
6.3k399RustApache-2.0today

openevolve

algorithmicsuperintelligence/openevolve
6.8

Open-source implementation of AlphaEvolve

Inference Engines
6.0k949PythonApache-2.027d ago

flashinfer

flashinfer-ai/flashinfer
7.5

FlashInfer: Kernel Library for LLM Serving

Inference Engines
5.4k896PythonApache-2.0today

kserve

kserve/kserve
7.7

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Inference Engines
5.3k1.4kGoApache-2.01d ago

Awesome-LLM-Inference

xlite-dev/Awesome-LLM-Inference
6.6

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Inference Engines
5.1k360PythonGPL-3.05d ago

eko

FellouAI/eko
7.3

Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai

Inference Engines
4.9k436TypeScriptMIT1mo ago

gpustack

gpustack/gpustack
7.0

Performance-optimized AI inference on your GPUs. Unlock superior throughput by selecting and tuning engines like vLLM or SGLang.

Inference Engines
4.8k497PythonApache-2.0today

shimmy

Michael-A-Kuykendall/shimmy
6.2

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

Inference Engines
4.0k343RustApache-2.019d ago

RuVector

ruvnet/RuVector
6.7

RuVector is a High Performance, Real-Time, Self-Learning, Vector Graph Neural Network, and Database built in Rust.

Inference Engines
3.8k463RustMITtoday

ruvector

ruvnet/ruvector
6.7

RuVector is a High Performance, Real-Time, Self-Learning, Vector Graph Neural Network, and Database built in Rust.

Inference Engines
3.8k463RustMITtoday

lorax

predibase/lorax
6.1

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Inference Engines
3.7k312PythonApache-2.010mo ago

lemonade

lemonade-sdk/lemonade
7.0

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

Inference Engines
3.5k261C++Apache-2.0today

optillm

algorithmicsuperintelligence/optillm
6.5

Optimizing inference proxy for LLMs

Inference Engines
3.4k268PythonApache-2.026d ago

deepsparse

neuralmagic/deepsparse
6.1

Sparsity-aware deep learning inference runtime for CPUs

Inference Engines
3.2k190PythonNOASSERTION10mo ago

distributed-llama

b4rtaz/distributed-llama
6.3

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

Inference Engines
2.9k225C++MITtoday

spiceai

spiceai/spiceai
6.9

A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.

Inference Engines
2.9k185RustApache-2.0today

Medusa

FasterDecoding/Medusa
5.4

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Inference Engines
2.7k197Jupyter NotebookApache-2.01y ago

ZhiLight

zhihu/ZhiLight
5.5

A highly optimized LLM inference acceleration engine for Llama and its variants.

Inference Engines
904102C++Apache-2.027d ago

kvcached

ovg-project/kvcached
5.6

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Inference Engines
85298PythonApache-2.07d ago

nobodywho

nobodywho-ooo/nobodywho
6.2

NobodyWho is an inference engine that lets you run LLMs locally and efficiently on any device.

Inference Engines
79055RustEUPL-1.2today

yalm

andrewkchan/yalm
3.8

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

Inference Engines
57059C++7mo ago

KuiperLLama

zjhellofss/KuiperLLama
4.1

校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

Inference Engines
527137C++5mo ago

mlxstudio

jjang-ai/mlxstudio
4.8

MLX Studio - Home of JANG_Q - Image Gen/Edit + Chat/Code All in one - + OpenClaw (Anthropic API)

Inference Engines
47732today

swiftLLM

interestingLSY/swiftLLM
3.9

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Inference Engines
32337PythonApache-2.010mo ago