High-performance model inference and serving runtimes
llama.cpp — a leading open-source project in the AI/LLM ecosystem.
vLLM — a leading open-source project in the AI/LLM ecosystem.