STACKQUADRANT

AI-Hypercomputer/JetStream

Model Serving

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

GitHub Metrics
Stars
415
Forks
58
Open Issues
26
Watchers
22
Contributors
38
Weekly Commits
0
Language
Python
License
Apache-2.0
Last Commit
Jan 5, 2026
Created
Mar 1, 2024
Latest Release
v0.3
Release Date
Dec 18, 2024
Synced: Mar 3, 2026
Quality Scores
Documentation Qualityw: 20%
0.0
Community Healthw: 20%
0.0
Maintenance Velocityw: 15%
0.0
API Design & DXw: 20%
0.0
Production Readinessw: 15%
0.0
Ecosystem Integrationw: 10%
0.0
Tags
gemmagptgpuinferencejaxlarge-language-modelsllamallama2llmllm-inference
Radar
No scores yet