FastChatFastChat is an open-source platform for training, serving, and evaluating large language models. Powers Chatbot Arena with OpenAI-compatible APIs and a distributed multi-model serving system.(0)0
vLLMvLLM is an open-source high-throughput LLM inference library supporting GPU, CPU, and TPU backends with an OpenAI-compatible API, PagedAttention, and production deployment tools.(0)0
ONNX RuntimeONNX Runtime is Microsoft's open-source AI engine for accelerated machine learning inference and training across cloud, edge, mobile, and web platforms.(0)0
QdrantQdrant is a high-performance, open-source vector search engine and database written in Rust. Build production-ready RAG pipelines, recommendation systems, and semantic search at scale.(0)0
ZenMLZenML is an open-source AI control plane for orchestrating ML pipelines and LLM agent workflows with automated versioning, infrastructure abstraction, and governance from local to Kubernetes.(0)0
QuivrQuivr is an opinionated open-source RAG framework supporting any LLM (GPT-4, Groq, Llama) and vectorstore (PGVector, Faiss). Build AI-powered apps faster without reinventing retrieval pipelines.(0)0
Ray AI FrameworkRay is an open source Python-native framework for scaling and orchestrating distributed AI, ML, and GenAI workloads across CPUs and GPUs at any scale.(0)0
Text Generation WebUIRun large language models locally with Text Generation WebUI — supports text, vision, tool-calling, and fine-tuning. 100% offline, open source, and private.(0)0
R Ray ServeRay Serve is an open-source, scalable model serving framework built on Ray for deploying ML models, LLMs, and multi-model pipelines in production.(0)0
MLC LLMMLC LLM is an open-source ML compiler and high-performance LLM inference engine. Deploy large language models natively on web, iOS, Android, and via REST API.(0)0