raketenkater/llm-server

Model Serving

Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuning (AI Tune), hardware-matched HuggingFace downloads, and crash recovery. An Ollama alternative for multi-GPU rigs.

GitHub →

4.8

GitHub Metrics

Stars

237

Forks

Open Issues

—

Watchers

Contributors

Weekly Commits

Language

License

MIT

Last Commit

Jun 25, 2026

Created

Mar 11, 2026

Latest Release

v3.1.0

Release Date

Jun 22, 2026

Synced: Jun 29, 2026

Quality Scores

Documentation Qualityw: 20%

3.9

No dedicated docs site. Description: 275 chars. Stars signal: 237. Contributors: 3. Score: 3.9/10

Community Healthw: 20%

2.7

Stars: 237. Contributors: 3. Watchers: 5. Forks: 12. Issue ratio: 0.0%. Score: 2.7/10

Maintenance Velocityw: 15%

7.5

Last commit: 3d ago. Weekly commits: 0. Latest release: v3.1.0. Score: 7.5/10

API Design & DXw: 20%

7.0

Stars/issues ratio: 237. Typed language: Go. No dedicated API docs. Permissive license: MIT. Popularity signal: 237 stars. Score: 7/10

Production Readinessw: 15%

3.5

Battle-tested: 237 stars. Peer review: 3 contributors. Versioned: v3.1.0. Licensed: MIT. Age: 0.3 years. Maintenance: last commit 3d ago. Score: 3.5/10

Ecosystem Integrationw: 10%

4.6

Fork interest: 12. Major ecosystem: Go. Integration-friendly: MIT. Adoption: 237 stars. Score: 4.6/10