FastChatFastChat is an open-source platform for training, serving, and evaluating large language models. Powers Chatbot Arena with OpenAI-compatible APIs and a distributed multi-model serving system.(0)0
llama.cppRun large language models locally with llama.cpp — a high-performance, open-source C/C++ inference engine supporting CUDA, Metal, Vulkan, and GGUF quantization for 50+ model architectures.(0)0
ZenMLZenML is an open-source AI control plane for orchestrating ML pipelines and LLM agent workflows with automated versioning, infrastructure abstraction, and governance from local to Kubernetes.(0)0
MT BenchMT Bench is an open-source multi-turn benchmark for evaluating large language models using GPT-4 as an automated judge. Part of the FastChat ecosystem by lm-sys.(0)0
ONNX RuntimeONNX Runtime is Microsoft's open-source AI engine for accelerated machine learning inference and training across cloud, edge, mobile, and web platforms.(0)0
vLLMvLLM is an open-source high-throughput LLM inference library supporting GPU, CPU, and TPU backends with an OpenAI-compatible API, PagedAttention, and production deployment tools.(0)0
QdrantQdrant is a high-performance, open-source vector search engine and database written in Rust. Build production-ready RAG pipelines, recommendation systems, and semantic search at scale.(0)0
QuivrQuivr is an opinionated open-source RAG framework supporting any LLM (GPT-4, Groq, Llama) and vectorstore (PGVector, Faiss). Build AI-powered apps faster without reinventing retrieval pipelines.(0)0
Text Generation WebUIRun large language models locally with Text Generation WebUI — supports text, vision, tool-calling, and fine-tuning. 100% offline, open source, and private.(0)0
R Ray ServeRay Serve is an open-source, scalable model serving framework built on Ray for deploying ML models, LLMs, and multi-model pipelines in production.(0)0