STACKQUADRANT

TensorRT-LLM

NVIDIA/TensorRT-LLM
7.3

TensorRT-LLM — a leading open-source project in the AI/LLM ecosystem.

Model Serving
13.4k2.3kPythonNOASSERTIONtoday

omlx

jundot/omlx
7.6

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

Model Serving
9.9k857PythonApache-2.0today

Deep-Learning-in-Production

ahkarami/Deep-Learning-in-Production
4.5

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

Model Serving
4.4k6921y ago

Olares

beclab/Olares
7.0

Olares: An Open-Source Personal Cloud to Reclaim Your Data

Model Serving
4.3k244GoAGPL-3.0today

vllm-omni

vllm-project/vllm-omni
7.2

A framework for efficient model inference with omni-modality models

Model Serving
4.3k753PythonApache-2.0today

LightLLM

ModelTC/LightLLM
6.5

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Model Serving
4.0k319PythonApache-2.0today

AI-Infra-from-Zero-to-Hero

HuaizhengZhang/AI-Infra-from-Zero-to-Hero
6.3

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

Model Serving
3.9k377MIT8mo ago

chitu

thu-pacman/chitu
6.9

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Model Serving
3.4k354PythonApache-2.0today

ramalama

containers/ramalama
7.4

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

Model Serving
2.7k330PythonMITtoday

inference

roboflow/inference
7.2

Turn any computer or edge device into a command center for your computer vision projects.

Model Serving
2.3k252PythonNOASSERTIONtoday

envd

tensorchord/envd
7.0

🏕️ Reproducible development environment for humans and agents

Model Serving
2.2k167GoApache-2.04d ago

aici

microsoft/aici
4.9

AICI: Prompts as (Wasm) Programs

Model Serving
2.1k83RustMIT1y ago

vllm-ascend

vllm-project/vllm-ascend
7.2

Community maintained hardware plugin for vLLM on Ascend

Model Serving
1.9k1.1kPythonApache-2.0today

mlrun

mlrun/mlrun
7.2

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

Model Serving
1.7k301PythonApache-2.0today

kitops

kitops-ml/kitops
6.9

An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.

Model Serving
1.3k173GoApache-2.01d ago

hopsworks

logicalclocks/hopsworks
5.8

Hopsworks - Data-Intensive AI platform with a Feature Store

Model Serving
1.3k156JavaAGPL-3.01y ago

truss

basetenlabs/truss
6.7

The simplest way to serve AI/ML models in production

Model Serving
1.1k98PythonMITtoday

rtp-llm

alibaba/rtp-llm
6.1

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Model Serving
1.1k168CudaApache-2.0today

Nanoflow

efeslab/Nanoflow
4.9

A throughput-oriented high-performance serving framework for LLMs

Model Serving
95248Jupyter Notebook16d ago

mosec

mosecorg/mosec
6.4

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

Model Serving
89872PythonApache-2.0today

model_server

openvinotoolkit/model_server
6.5

A scalable inference server for models optimized with OpenVINO™

Model Serving
856248C++Apache-2.0today

pipeless

pipeless-ai/pipeless
4.9

An open-source computer vision framework to build and deploy apps in minutes

Model Serving
85053RustApache-2.01y ago

Yatai

bentoml/Yatai
5.3

Model Deployment at Scale on Kubernetes 🦄️

Model Serving
83877TypeScriptNOASSERTION1y ago

ServerlessLLM

ServerlessLLM/ServerlessLLM
5.9

Serverless LLM Serving for Everyone.

Model Serving
67470PythonApache-2.01mo ago

timber

kossisoroyce/timber
5.5

Ollama for classical ML models. AOT compiler that turns XGBoost, LightGBM, scikit-learn, CatBoost & ONNX models into native C99 inference code. One command to load, one command to serve. 336x faster than Python inference.

Model Serving
66720PythonNOASSERTION1mo ago

fastapi-ml-skeleton

eightBEC/fastapi-ml-skeleton
4.7

FastAPI Skeleton App to serve machine learning models production-ready.

Model Serving
60393PythonApache-2.03mo ago

pinferencia

underneathall/pinferencia
4.7

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

Model Serving
54582PythonApache-2.03y ago

xFasterTransformer

intel/xFasterTransformer
4.5

xFasterTransformer — open-source AI/LLM project.

Model Serving
43674C++Apache-2.06mo ago

JetStream

AI-Hypercomputer/JetStream
5.0

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Model Serving
42463PythonApache-2.03mo ago

gpu-rest-engine

NVIDIA/gpu-rest-engine
3.9

A REST API for Caffe using Docker and Go

Model Serving
42393C++BSD-3-Clause7y ago

stable-diffusion-deploy

Lightning-Universe/stable-diffusion-deploy
4.7

Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.

Model Serving
39139PythonApache-2.02y ago

podman-desktop-extension-ai-lab

containers/podman-desktop-extension-ai-lab
5.9

Work with LLMs on a local environment using containers

Model Serving
29180TypeScriptApache-2.0today

BMW-YOLOv4-Inference-API-GPU

BMW-InnovationLab/BMW-YOLOv4-Inference-API-GPU
4.1

This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.

Model Serving
27867PythonBSD-3-Clause3y ago

BMW-YOLOv4-Inference-API-CPU

BMW-InnovationLab/BMW-YOLOv4-Inference-API-CPU
3.9

This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.

Model Serving
21858PythonNOASSERTION3y ago