Alternatives to llama.cpp | Tossom

FastChatFastChat is an open-source platform for training, serving, and evaluating large language models. Powers Chatbot Arena with OpenAI-compatible APIs and a distributed multi-model serving system.

(0)

vLLMvLLM is an open-source high-throughput LLM inference library supporting GPU, CPU, and TPU backends with an OpenAI-compatible API, PagedAttention, and production deployment tools.

(0)

ONNX RuntimeONNX Runtime is Microsoft's open-source AI engine for accelerated machine learning inference and training across cloud, edge, mobile, and web platforms.

(0)

QdrantQdrant is a high-performance, open-source vector search engine and database written in Rust. Build production-ready RAG pipelines, recommendation systems, and semantic search at scale.

(0)

ZenMLZenML is an open-source AI control plane for orchestrating ML pipelines and LLM agent workflows with automated versioning, infrastructure abstraction, and governance from local to Kubernetes.

(0)

QuivrQuivr is an opinionated open-source RAG framework supporting any LLM (GPT-4, Groq, Llama) and vectorstore (PGVector, Faiss). Build AI-powered apps faster without reinventing retrieval pipelines.

(0)

Ray AI FrameworkRay is an open source Python-native framework for scaling and orchestrating distributed AI, ML, and GenAI workloads across CPUs and GPUs at any scale.

(0)

Text Generation WebUIRun large language models locally with Text Generation WebUI — supports text, vision, tool-calling, and fine-tuning. 100% offline, open source, and private.

(0)

R

Ray ServeRay Serve is an open-source, scalable model serving framework built on Ray for deploying ML models, LLMs, and multi-model pipelines in production.

(0)

MLC LLMMLC LLM is an open-source ML compiler and high-performance LLM inference engine. Deploy large language models natively on web, iOS, Android, and via REST API.

(0)