AI/LLM Ecosystem Directory — Repository Ratings

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Model Serving

★ 4.1k◇ 332PythonApache-2.0today

AI-Infra-from-Zero-to-Hero

HuaizhengZhang/AI-Infra-from-Zero-to-Hero

6.2

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

Model Serving

★ 4.1k◇ 393MIT10mo ago

chitu

thu-pacman/chitu

6.8

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Model Serving

★ 3.1k◇ 266PythonApache-2.0today

ramalama

containers/ramalama

7.5

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

Model Serving

★ 2.9k◇ 340PythonMIT1d ago

inference

roboflow/inference

7.3

Turn any computer or edge device into a command center for your computer vision projects.

Model Serving

★ 2.3k◇ 269PythonNOASSERTIONtoday

envd

tensorchord/envd

6.9

🏕️ Reproducible development environment for humans and agents

Model Serving

★ 2.2k◇ 168GoApache-2.012d ago

vllm-ascend

vllm-project/vllm-ascend

7.2

Community maintained hardware plugin for vLLM on Ascend

Model Serving

★ 2.2k◇ 1.3kC++Apache-2.0today

aici

microsoft/aici

4.9

AICI: Prompts as (Wasm) Programs

Model Serving

★ 2.1k◇ 84RustMIT1y ago

sie

superlinked/sie

6.6

Superlinked Inference Engine is an Open-source inference server and production cluster for embeddings, reranking, and extraction.

Model Serving

★ 2.0k◇ 177PythonApache-2.04d ago

mlrun

mlrun/mlrun

7.2

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

Model Serving

★ 1.7k◇ 305PythonApache-2.0today

kitops

kitops-ml/kitops

6.9

An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.

Model Serving

★ 1.3k◇ 170GoApache-2.0today

hopsworks

logicalclocks/hopsworks

5.8

Hopsworks - Data-Intensive AI platform with a Feature Store

Model Serving

★ 1.3k◇ 158JavaAGPL-3.01y ago

rtp-llm

alibaba/rtp-llm

6.0

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Model Serving

★ 1.2k◇ 204CudaApache-2.0today

truss

basetenlabs/truss

6.7

The simplest way to serve AI/ML models in production

Model Serving

★ 1.2k◇ 107PythonMITtoday

Nanoflow

efeslab/Nanoflow

4.7

A throughput-oriented high-performance serving framework for LLMs

Model Serving

★ 962◇ 49Jupyter Notebook2mo ago

mosec

mosecorg/mosec

6.5

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

Model Serving

★ 900◇ 72PythonApache-2.02d ago

model_server

openvinotoolkit/model_server

6.5

A scalable inference server for models optimized with OpenVINO™

Model Serving

★ 880◇ 253C++Apache-2.0today

pipeless

pipeless-ai/pipeless

4.9

An open-source computer vision framework to build and deploy apps in minutes

Model Serving

★ 850◇ 52RustApache-2.02y ago

Yatai

bentoml/Yatai

6.2

Model Deployment at Scale on Kubernetes 🦄️

Model Serving

★ 845◇ 76TypeScriptNOASSERTION4d ago

ServerlessLLM

ServerlessLLM/ServerlessLLM

5.9

Serverless LLM Serving for Everyone.

Model Serving

★ 685◇ 73PythonApache-2.029d ago

timber

kossisoroyce/timber

5.5

Ollama for classical ML models. AOT compiler that turns XGBoost, LightGBM, scikit-learn, CatBoost & ONNX models into native C99 inference code. One command to load, one command to serve. 336x faster than Python inference.

Model Serving

★ 682◇ 23PythonNOASSERTION1mo ago

fastapi-ml-skeleton

eightBEC/fastapi-ml-skeleton

4.6

FastAPI Skeleton App to serve machine learning models production-ready.

Model Serving

★ 601◇ 91PythonApache-2.04mo ago

pinferencia

underneathall/pinferencia

4.7

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

Model Serving

★ 544◇ 83PythonApache-2.03y ago

ome

ome-projects/ome

6.0

Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

Model Serving

★ 461◇ 81GoApache-2.0today

JetStream

AI-Hypercomputer/JetStream

4.9

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Model Serving

★ 442◇ 65PythonApache-2.04mo ago

xFasterTransformer

intel/xFasterTransformer

4.4

xFasterTransformer — open-source AI/LLM project.

Model Serving

★ 435◇ 75C++Apache-2.08mo ago

gpu-rest-engine

NVIDIA/gpu-rest-engine

3.7

A REST API for Caffe using Docker and Go

Model Serving

★ 423◇ 93C++BSD-3-Clause7y ago

stable-diffusion-deploy

Lightning-Universe/stable-diffusion-deploy

4.6

Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.

Model Serving

★ 391◇ 39PythonApache-2.02y ago