STACKQUADRANT

raketenkater/llm-server

Model Serving

Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuning (AI Tune), hardware-matched HuggingFace downloads, and crash recovery. An Ollama alternative for multi-GPU rigs.

4.8
GitHub Metrics
Stars
237
Forks
12
Open Issues
Watchers
5
Contributors
3
Weekly Commits
0
Language
Go
License
MIT
Last Commit
Jun 25, 2026
Created
Mar 11, 2026
Latest Release
v3.1.0
Release Date
Jun 22, 2026
Synced: Jun 29, 2026
Quality Scores
Documentation Qualityw: 20%
3.9

No dedicated docs site. Description: 275 chars. Stars signal: 237. Contributors: 3. Score: 3.9/10

Community Healthw: 20%
2.7

Stars: 237. Contributors: 3. Watchers: 5. Forks: 12. Issue ratio: 0.0%. Score: 2.7/10

Maintenance Velocityw: 15%
7.5

Last commit: 3d ago. Weekly commits: 0. Latest release: v3.1.0. Score: 7.5/10

API Design & DXw: 20%
7.0

Stars/issues ratio: 237. Typed language: Go. No dedicated API docs. Permissive license: MIT. Popularity signal: 237 stars. Score: 7/10

Production Readinessw: 15%
3.5

Battle-tested: 237 stars. Peer review: 3 contributors. Versioned: v3.1.0. Licensed: MIT. Age: 0.3 years. Maintenance: last commit 3d ago. Score: 3.5/10

Ecosystem Integrationw: 10%
4.6

Fork interest: 12. Major ecosystem: Go. Integration-friendly: MIT. Adoption: 237 stars. Score: 4.6/10

Tags
cudaggufgolanginference-serverllama-cppllamacppllmlocal-llmlocalllamametal
Radar
Documentation Quality
Community Health
Maintenance Velocity
API Design & DX
Production Readiness
Ecosystem Integration