Inference Engines

High-performance model inference and serving runtimes

41 repos

Stars Score Name

llama.cpp

llama.cpp — a leading open-source project in the AI/LLM ecosystem.

★ 120.8k◇ 20.7kC++

vLLM

vLLM — a leading open-source project in the AI/LLM ecosystem.

★ 86.6k◇ 19.6kPython

nomic-ai/gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

★ 77.4k◇ 8.3kC++

ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

★ 43.3k◇ 7.8kPython

gitleaks/gitleaks

Find secrets with Gitleaks 🔑

★ 28.2k◇ 2.2kGo

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

★ 24.7k◇ 2.8kHTML

Lightning-AI/litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

★ 13.5k◇ 1.5kPython

halfrost/Halfrost-Field

✍🏻 Source Code Deep Dives, System Design & Engineering Blogs | Halfrost-Field 冰霜之地：源码解析、系统设计与工程实践笔记

★ 13.2k◇ 1.9kGo

bentoml/OpenLLM

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

★ 12.4k◇ 824Python

mistralai/mistral-inference

Official inference library for Mistral models

★ 10.8k◇ 1.1kJupyter Notebook

openvinotoolkit/openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

★ 10.5k◇ 3.3kC++

Tiiny-AI/PowerInfer

High-speed Large Language Model Serving for Local Deployment

★ 9.6k◇ 589C++

bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

★ 8.7k◇ 985Python

InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

★ 8.0k◇ 707Python

ai-dynamo/dynamo

A Datacenter Scale Distributed Inference Serving Framework

★ 7.5k◇ 1.3kRust

katanemo/plano

Plano is an AI-native proxy server and data plane for agentic apps - centralizing orchestration, safety, observability, and smart LLM routing so you can deliver agents faster.

★ 6.9k◇ 470Rust

algorithmicsuperintelligence/openevolve

Open-source implementation of AlphaEvolve

★ 6.7k◇ 1.1kPython

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

★ 6.0k◇ 1.2kPython

kserve/kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

★ 5.7k◇ 1.6kGo

Michael-A-Kuykendall/shimmy

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

★ 5.6k◇ 543Rust

xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

★ 5.4k◇ 426Python

gpustack/gpustack

Performance-optimized AI inference on your GPUs. Unlock superior throughput by selecting and tuning engines like vLLM or SGLang.

★ 5.3k◇ 576Python

lemonade-sdk/lemonade

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

★ 5.0k◇ 410C++

FellouAI/eko

Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai

★ 4.9k◇ 439TypeScript