Arize Phoenix AI

open_source

Arize Phoenix is an open-source LLM observability platform built on OpenTelemetry. Instrument, trace, evaluate, and optimize AI applications in real time—framework-agnostic and vendor-lock-in free.

LLM Developer Tools

AI Infrastructure Tools

AI Research Tools

About

Arize Phoenix is a fully open-source, self-hostable LLM tracing and evaluation platform designed for AI engineers and developers building production-grade AI applications. Built on top of OpenTelemetry (OTEL), Phoenix is framework-agnostic and vendor-lock-in free—giving teams full transparency and freedom to instrument, scale, or migrate without restriction. Phoenix provides automatic instrumentation for LLM applications, collecting traces that record the full path of requests as they propagate through multi-step workflows. Engineers can pinpoint exactly where their LLM pipeline breaks, whether during retrieval, tool execution, or model inference. The platform includes an interactive Prompt Playground for rapid prompt and model iteration—compare outputs side-by-side, visualize responses, and debug failures without leaving your workflow. Its Evaluations & Annotations module offers an ergonomic eval library with pre-built templates that can be customized to any task, plus support for incorporating human feedback directly into the evaluation loop. Dataset Clustering & Visualization leverages embedding representations to surface semantically similar queries, document chunks, and responses, making it easy to isolate clusters of poor performance at scale. Phoenix integrates natively with popular LLM frameworks including LlamaIndex, LangChain, and others, and is actively maintained as an open-source project on GitHub under the Arize AI organization. It is ideal for ML engineers, AI researchers, and development teams who need deep observability into LLM-powered applications.

Key Features

Application Tracing: Automatically or manually instrument LLM apps to collect detailed traces, recording every step of a request through your AI pipeline to pinpoint failures and bottlenecks.
Interactive Prompt Playground: A fast, flexible sandbox for iterating on prompts and models—compare outputs side-by-side, visualize responses, and debug failures without leaving your workflow.
Evaluations & Annotations: An ergonomic eval library with pre-built templates for tasks like relevance, toxicity, and quality scoring, fully customizable and supporting human feedback loops.
Dataset Clustering & Visualization: Use embeddings to semantically cluster questions, document chunks, and responses, making it easy to uncover and isolate regions of poor model performance.
OpenTelemetry-Based Architecture: Built natively on OTEL, Phoenix is framework-agnostic and vendor-lock-in free—works seamlessly with LlamaIndex, LangChain, and other major LLM tools.

Use Cases

Tracing multi-step LLM application requests to identify exactly where a pipeline breaks or degrades in production.
Evaluating model responses for relevance, toxicity, and quality using pre-built or custom evaluation templates.
Debugging RAG (retrieval-augmented generation) pipelines by inspecting retrieval steps, context passing, and model outputs.
Iterating rapidly on prompts and comparing model outputs in an interactive playground during development.
Clustering and visualizing embedding spaces to detect groups of poorly performing queries or document chunks at scale.

Pros

Fully Open Source & Self-Hostable: No feature gates, no paywalls—Phoenix is 100% open source with full self-hosting support, giving teams complete control over their data and infrastructure.
No Vendor Lock-In: Built on OpenTelemetry, Phoenix is framework and language agnostic, so teams can adopt, scale, or migrate without being tied to any specific vendor or stack.
Rich Ecosystem Integration: Works out of the box with popular LLM frameworks like LlamaIndex and LangChain, with one-click integrations and broad community support.
Strong Community Traction: With 8.8k+ GitHub stars and 2.5M+ monthly downloads, Phoenix has proven adoption and an active community contributing to its development.

Cons

Self-Hosting Requires Infrastructure Setup: Teams choosing to self-host must provision and manage their own infrastructure, which can add operational overhead compared to fully managed SaaS alternatives.
Developer-Centric Tool: Phoenix is primarily designed for ML engineers and developers—non-technical stakeholders may find the interface and configuration challenging without support.
Learning Curve for Advanced Features: Getting the most out of custom evaluations, embedding visualizations, and OTEL instrumentation requires familiarity with LLM observability concepts and tooling.

Frequently Asked Questions

Yes. Phoenix is fully open source and free to use with no feature gates or restrictions. You can self-host it on your own infrastructure at no cost.

Phoenix is framework-agnostic and language-agnostic, built on OpenTelemetry. It integrates natively with popular frameworks like LlamaIndex, LangChain, and others, and supports any language with OTEL instrumentation.

Phoenix traces every step of a RAG request—including retrieval, context injection, and model generation—so you can identify exactly where failures occur, whether in document retrieval or model response quality.

Yes. Phoenix's evaluation and annotation module supports human feedback natively, allowing you to combine automated LLM-based evaluations with manual human annotations for a more comprehensive quality assessment.

OpenTelemetry (OTEL) is an open-source observability framework for collecting traces, metrics, and logs. Phoenix uses OTEL to ensure seamless setup, full data transparency, and freedom from vendor lock-in—you own your data and can move it anywhere.