DeepEvalDeepEval is an open-source LLM evaluation framework with 50+ research-backed metrics, Pytest integration, and support for single/multi-turn and multi-modal AI testing.(0)0
Argilla Data PlatformArgilla is an open-source platform where AI engineers and domain experts collaborate to build high-quality datasets for LLM fine-tuning, RLHF, and NLP model evaluation.(0)0
MLRunMLRun is an open-source AI orchestration framework for managing ML and generative AI pipelines from development to production, with auto-scaling, monitoring, and multi-cloud support.(0)0
R Ray ServeRay Serve is an open-source, scalable model serving framework built on Ray for deploying ML models, LLMs, and multi-model pipelines in production.(0)0
NeMo GuardrailsNVIDIA's open-source toolkit for adding programmable topical, safety, and dialog guardrails to LLM-based conversational AI systems using the Colang DSL.(0)0
Langtrace AILangtrace AI is an open-source observability and evaluations platform for AI agents and LLM applications. Track token usage, cost, latency, and accuracy with minimal setup.(0)0
Milvus AI Vector DBMilvus is an open-source vector database built for GenAI applications. Perform high-speed similarity searches and scale to tens of billions of vectors with minimal performance loss.(0)0
MLflowMLflow is the largest open source AI engineering platform. Debug, evaluate, monitor, and deploy AI agents, LLMs, and ML models with 30M+ monthly downloads.(0)0
OpenAI EvalsOpenAI Evals is an open-source framework for evaluating large language models and LLM systems, featuring a community registry of benchmarks and support for custom private evals.(0)0
LocalAILocalAI is a free, open-source alternative to OpenAI and Anthropic. Run LLMs, image generation, audio, and autonomous agents locally on your own hardware with complete privacy.(0)0