STACKQUADRANT

Model Serving

Platforms for deploying and serving ML/AI models at scale

34 repos

kossisoroyce/timber

5.5

Ollama for classical ML models. AOT compiler that turns XGBoost, LightGBM, scikit-learn, CatBoost & ONNX models into native C99 inference code. One command to load, one command to serve. 336x faster than Python inference.

66720Python

eightBEC/fastapi-ml-skeleton

4.7

FastAPI Skeleton App to serve machine learning models production-ready.

60393Python

underneathall/pinferencia

4.7

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

54582Python

intel/xFasterTransformer

4.5

xFasterTransformer — open-source AI/LLM project.

43674C++

AI-Hypercomputer/JetStream

5.0

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

42463Python

NVIDIA/gpu-rest-engine

3.9

A REST API for Caffe using Docker and Go

42393C++

Lightning-Universe/stable-diffusion-deploy

4.7

Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.

39139Python

containers/podman-desktop-extension-ai-lab

5.9

Work with LLMs on a local environment using containers

29180TypeScript

BMW-InnovationLab/BMW-YOLOv4-Inference-API-GPU

4.1

This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.

27867Python

BMW-InnovationLab/BMW-YOLOv4-Inference-API-CPU

3.9

This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.

21858Python