STACKQUADRANT

Model Serving

Platforms for deploying and serving ML/AI models at scale

34 repos

TensorRT-LLM

7.3

TensorRT-LLM — a leading open-source project in the AI/LLM ecosystem.

13.4k2.3kPython

jundot/omlx

7.6

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

10.0k862Python

ahkarami/Deep-Learning-in-Production

4.5

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

4.4k692

beclab/Olares

7.0

Olares: An Open-Source Personal Cloud to Reclaim Your Data

4.3k244Go

vllm-project/vllm-omni

7.2

A framework for efficient model inference with omni-modality models

4.3k757Python

ModelTC/LightLLM

6.5

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

4.0k319Python

HuaizhengZhang/AI-Infra-from-Zero-to-Hero

6.3

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

3.9k377

thu-pacman/chitu

6.9

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

3.4k354Python

containers/ramalama

7.4

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

2.7k330Python

roboflow/inference

7.2

Turn any computer or edge device into a command center for your computer vision projects.

2.3k252Python

tensorchord/envd

7.0

🏕️ Reproducible development environment for humans and agents

2.2k167Go

microsoft/aici

4.9

AICI: Prompts as (Wasm) Programs

2.1k83Rust

vllm-project/vllm-ascend

7.2

Community maintained hardware plugin for vLLM on Ascend

1.9k1.1kPython

mlrun/mlrun

7.2

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

1.7k301Python

kitops-ml/kitops

6.9

An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.

1.3k173Go

logicalclocks/hopsworks

5.8

Hopsworks - Data-Intensive AI platform with a Feature Store

1.3k156Java

basetenlabs/truss

6.7

The simplest way to serve AI/ML models in production

1.1k98Python

alibaba/rtp-llm

6.1

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

1.1k169Cuda

efeslab/Nanoflow

4.9

A throughput-oriented high-performance serving framework for LLMs

95248Jupyter Notebook

mosecorg/mosec

6.4

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

89872Python

openvinotoolkit/model_server

6.5

A scalable inference server for models optimized with OpenVINO™

856248C++

pipeless-ai/pipeless

4.9

An open-source computer vision framework to build and deploy apps in minutes

85053Rust

bentoml/Yatai

5.3

Model Deployment at Scale on Kubernetes 🦄️

83877TypeScript

ServerlessLLM/ServerlessLLM

5.9

Serverless LLM Serving for Everyone.

67470Python