STACKQUADRANT

Model Serving

Platforms for deploying and serving ML/AI models at scale

38 repos

ServerlessLLM/ServerlessLLM

5.9

Serverless LLM Serving for Everyone.

68573Python

kossisoroyce/timber

5.5

Ollama for classical ML models. AOT compiler that turns XGBoost, LightGBM, scikit-learn, CatBoost & ONNX models into native C99 inference code. One command to load, one command to serve. 336x faster than Python inference.

68223Python

eightBEC/fastapi-ml-skeleton

4.6

FastAPI Skeleton App to serve machine learning models production-ready.

60191Python

underneathall/pinferencia

4.7

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

54483Python

ome-projects/ome

6.0

Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

46181Go

AI-Hypercomputer/JetStream

4.9

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

44265Python

intel/xFasterTransformer

4.4

xFasterTransformer — open-source AI/LLM project.

43575C++

NVIDIA/gpu-rest-engine

3.7

A REST API for Caffe using Docker and Go

42393C++

Lightning-Universe/stable-diffusion-deploy

4.6

Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.

39139Python

Epistates/pmetal

5.1

PMetal: high-performance Apple Silicon framework for local LLM inference, LoRA/QLoRA fine-tuning, serving, quantization, and MLX/Metal acceleration.

29320Rust

containers/podman-desktop-extension-ai-lab

5.9

Work with LLMs on a local environment using containers

29182TypeScript

aiptimizer/TurboOCR

5.1

Fast GPU OCR server. 270 img/s on FUNSD. TensorRT FP16, PP-OCRv5, HTTP + gRPC.

28435C++

BMW-InnovationLab/BMW-YOLOv4-Inference-API-GPU

4.1

This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.

27967Python

BMW-InnovationLab/BMW-YOLOv4-Inference-API-CPU

3.9

This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.

21958Python