Model Serving

Platforms for deploying and serving ML/AI models at scale

38 repos

Stars Score Name

ServerlessLLM/ServerlessLLM

5.9

Serverless LLM Serving for Everyone.

★ 685◇ 73Python

kossisoroyce/timber

5.5

Ollama for classical ML models. AOT compiler that turns XGBoost, LightGBM, scikit-learn, CatBoost & ONNX models into native C99 inference code. One command to load, one command to serve. 336x faster than Python inference.

★ 682◇ 23Python

eightBEC/fastapi-ml-skeleton

4.6

FastAPI Skeleton App to serve machine learning models production-ready.

★ 601◇ 91Python

underneathall/pinferencia

4.7

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

★ 544◇ 83Python

ome-projects/ome

6.0

Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

★ 461◇ 81Go

AI-Hypercomputer/JetStream

4.9

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

★ 442◇ 65Python

intel/xFasterTransformer

4.4

xFasterTransformer — open-source AI/LLM project.

★ 435◇ 75C++

NVIDIA/gpu-rest-engine

3.7

A REST API for Caffe using Docker and Go

★ 423◇ 93C++

Lightning-Universe/stable-diffusion-deploy

4.6

Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.

★ 391◇ 39Python