AI/LLM Ecosystem Directory — Repository Ratings — StackQuadrant

Directory (37)

All Repos37 LLM Frameworks58 Agent Frameworks113 Fine-tuning Tools55 RAG Libraries57 Vector Databases44 Inference Engines37 Prompt Engineering48 AI DevOps40 Model Serving38 Evaluation & Testing36

Sort By

★Stars ◎Score AName ↻Updated

llama.cpp

ggml-org/llama.cpp

llama.cpp — a leading open-source project in the AI/LLM ecosystem.

Inference Engines

★ 114.4k◇ 19.1kC++MITtoday

vLLM

vllm-project/vllm

vLLM — a leading open-source project in the AI/LLM ecosystem.

Inference Engines

★ 81.8k◇ 17.6kPythonApache-2.0today

gpt4all

nomic-ai/gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

Inference Engines

★ 77.4k◇ 8.3kC++MIT1y ago

ray

ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Inference Engines

★ 42.8k◇ 7.6kPythonApache-2.0today

gitleaks

gitleaks/gitleaks

Find secrets with Gitleaks 🔑

Inference Engines

★ 27.5k◇ 2.1kGoMIT1d ago

llm-action

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

Inference Engines

★ 24.4k◇ 2.8kHTMLApache-2.09d ago

litgpt

Lightning-AI/litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Inference Engines

★ 13.4k◇ 1.4kPythonApache-2.01d ago

OpenLLM

bentoml/OpenLLM

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Inference Engines

★ 12.3k◇ 811PythonApache-2.01d ago

mistral-inference

mistralai/mistral-inference

Official inference library for Mistral models

Inference Engines

★ 10.8k◇ 1.1kJupyter NotebookApache-2.01mo ago

openvino

openvinotoolkit/openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

Inference Engines

★ 10.3k◇ 3.2kC++Apache-2.0today

PowerInfer

Tiiny-AI/PowerInfer

High-speed Large Language Model Serving for Local Deployment

Inference Engines

★ 9.5k◇ 579C++MIT23d ago

BentoML

bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Inference Engines

★ 8.7k◇ 968PythonApache-2.0today

lmdeploy

InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Inference Engines

★ 7.9k◇ 701PythonApache-2.0today

plano

Plano is an AI-native proxy server and data plane for agentic apps - centralizing orchestration, safety, observability, and smart LLM routing so you can deliver agents faster.

Inference Engines

★ 6.6k◇ 427RustApache-2.01d ago

openevolve

algorithmicsuperintelligence/openevolve

Open-source implementation of AlphaEvolve

Inference Engines

★ 6.5k◇ 1.0kPythonApache-2.02mo ago

flashinfer

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Inference Engines

★ 5.7k◇ 1.0kPythonApache-2.0today

kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Inference Engines

★ 5.5k◇ 1.5kGoApache-2.0today

shimmy

Michael-A-Kuykendall/shimmy

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

Inference Engines

★ 5.3k◇ 503RustApache-2.01d ago

Awesome-LLM-Inference

xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Inference Engines

★ 5.3k◇ 381PythonGPL-3.01mo ago

gpustack

gpustack/gpustack

Performance-optimized AI inference on your GPUs. Unlock superior throughput by selecting and tuning engines like vLLM or SGLang.

Inference Engines

★ 5.1k◇ 540PythonApache-2.0today

eko

Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai

Inference Engines

★ 4.9k◇ 439TypeScriptMIT3mo ago

lemonade

lemonade-sdk/lemonade

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

Inference Engines

★ 4.2k◇ 330C++Apache-2.0today

ruvector

ruvnet/ruvector

RuVector is a High Performance, Real-Time, Self-Learning, Vector Graph Neural Network, and Database built in Rust.

Inference Engines

★ 4.2k◇ 544RustMITtoday

RuVector

ruvnet/RuVector

RuVector is a High Performance, Real-Time, Self-Learning, Vector Graph Neural Network, and Database built in Rust.

Inference Engines

★ 4.2k◇ 544RustMITtoday

optillm

algorithmicsuperintelligence/optillm

Optimizing inference proxy for LLMs

Inference Engines

★ 4.1k◇ 355PythonApache-2.027d ago

lorax

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Inference Engines

★ 3.8k◇ 316PythonApache-2.05d ago

deepsparse

neuralmagic/deepsparse

Sparsity-aware deep learning inference runtime for CPUs

Inference Engines

★ 3.2k◇ 192PythonNOASSERTION1y ago

spiceai

spiceai/spiceai

A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.

Inference Engines

★ 2.9k◇ 197RustApache-2.0today

distributed-llama

b4rtaz/distributed-llama

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

Inference Engines

★ 2.9k◇ 232C++MIT1mo ago

Medusa

FasterDecoding/Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Inference Engines

★ 2.7k◇ 201Jupyter NotebookApache-2.01y ago

kvcached

ovg-project/kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Inference Engines

★ 1.1k◇ 118PythonApache-2.0today

nobodywho

nobodywho-ooo/nobodywho

NobodyWho is an inference engine that lets you run LLMs locally and efficiently on any device.

Inference Engines

★ 944◇ 66RustEUPL-1.2today

ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

Inference Engines

★ 905◇ 102C++Apache-2.02mo ago

mlxstudio

jjang-ai/mlxstudio

MLX Studio - Home of JANG_Q - Image Gen/Edit + Chat/Code All in one - + OpenClaw (Anthropic API)

Inference Engines

★ 763◇ 49today

yalm

andrewkchan/yalm

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

Inference Engines

★ 584◇ 62C++8mo ago

KuiperLLama

zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

Inference Engines

★ 548◇ 142C++7mo ago

swiftLLM

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Inference Engines

★ 329◇ 38PythonApache-2.011mo ago