STACKQUADRANT

Model Serving

Platforms for deploying and serving ML/AI models at scale

38 repos

jundot/omlx

7.8

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

15.7k1.3kPython

TensorRT-LLM

7.3

TensorRT-LLM — a leading open-source project in the AI/LLM ecosystem.

13.8k2.4kPython

vllm-project/vllm-omni

7.3

A framework for efficient model inference with omni-modality models

4.9k1.0kPython

beclab/Olares

7.0

Olares: An Open-Source Personal Cloud to Reclaim Your Data

4.6k266Go

ahkarami/Deep-Learning-in-Production

4.5

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

4.4k687

ModelTC/LightLLM

6.5

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

4.1k332Python

HuaizhengZhang/AI-Infra-from-Zero-to-Hero

6.2

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

4.1k393

thu-pacman/chitu

6.8

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

3.1k266Python

containers/ramalama

7.5

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

2.9k340Python

roboflow/inference

7.3

Turn any computer or edge device into a command center for your computer vision projects.

2.3k269Python

tensorchord/envd

6.9

🏕️ Reproducible development environment for humans and agents

2.2k168Go

vllm-project/vllm-ascend

7.2

Community maintained hardware plugin for vLLM on Ascend

2.2k1.3kC++

microsoft/aici

4.9

AICI: Prompts as (Wasm) Programs

2.1k84Rust

superlinked/sie

6.6

Superlinked Inference Engine is an Open-source inference server and production cluster for embeddings, reranking, and extraction.

2.0k177Python

mlrun/mlrun

7.2

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

1.7k305Python

kitops-ml/kitops

6.9

An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.

1.3k170Go

logicalclocks/hopsworks

5.8

Hopsworks - Data-Intensive AI platform with a Feature Store

1.3k158Java

alibaba/rtp-llm

6.0

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

1.2k204Cuda

basetenlabs/truss

6.7

The simplest way to serve AI/ML models in production

1.2k107Python

efeslab/Nanoflow

4.7

A throughput-oriented high-performance serving framework for LLMs

96249Jupyter Notebook

mosecorg/mosec

6.5

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

90072Python

openvinotoolkit/model_server

6.5

A scalable inference server for models optimized with OpenVINO™

880253C++

pipeless-ai/pipeless

4.9

An open-source computer vision framework to build and deploy apps in minutes

85052Rust

bentoml/Yatai

6.2

Model Deployment at Scale on Kubernetes 🦄️

84576TypeScript