Instructor

open_source

Instructor is the #1 open-source library for extracting structured, validated data from LLMs. Supports 15+ providers across 6 languages with automatic retries and streaming.

Coding & Development

LLM Developer Tools

AI Frameworks

About

Instructor is an open-source library designed to make working with LLMs reliable and predictable by enforcing structured, validated outputs. With over 3 million monthly downloads and 11k GitHub stars, it is the go-to solution for developers who need consistent, schema-first data extraction from AI models. By defining Pydantic models, developers specify exactly what shape of data they expect, and Instructor handles validation, automatic retries on failure, and streaming of partial responses — all without manual error handling. It supports 15+ LLM providers including OpenAI, Anthropic Claude, Google Gemini, Mistral, Cohere, Ollama, and DeepSeek, making it easy to switch between providers using the same codebase. Instructor is available in six languages: Python, TypeScript, Go, Ruby, Elixir, and Rust, making it suitable for polyglot engineering teams. It also supports local open-source models via Ollama, llama-cpp-python, and vLLM. Ideal for developers building data pipelines, information extraction systems, AI-powered APIs, or any application where LLM outputs must conform to a strict schema, Instructor is a lightweight but powerful layer that complements agent frameworks like PydanticAI while excelling at fast, schema-first extraction tasks.

Key Features

Structured Outputs via Pydantic: Define Pydantic models to precisely specify the shape and types of data you want from any LLM, with full IDE autocompletion and type inference.
Automatic Validation & Retries: Built-in retry logic automatically re-prompts the LLM when validation fails, eliminating manual error handling and improving output reliability.
15+ LLM Provider Support: Works seamlessly with OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Ollama, DeepSeek, and more using a unified interface.
Streaming Support: Process partial responses in real-time with streaming support, enabling low-latency applications and large list extractions.
Multi-Language Availability: Available in Python, TypeScript, Go, Ruby, Elixir, and Rust, making it accessible to polyglot teams across the stack.

Use Cases

Extracting structured entities (names, dates, addresses) from unstructured text documents using a defined schema
Building AI-powered APIs that must return consistent JSON or typed objects regardless of which LLM is used
Creating data pipelines that ingest LLM outputs and feed them into databases or downstream services with strict format requirements
Automating information extraction from PDFs, emails, or web content with automatic validation and retry on malformed outputs
Prototyping multi-provider LLM applications where switching between OpenAI, Anthropic, and local models requires minimal code changes

Pros

Broad Provider & Language Coverage: Supports 15+ LLM providers and 6 programming languages, reducing lock-in and making adoption easy across diverse tech stacks.
Production-Ready Reliability: Automatic retries and Pydantic validation ensure LLM outputs always conform to your schema, making it safe to use in production pipelines.
Massive Community & Adoption: With 3M+ monthly downloads and 100+ contributors, Instructor is well-maintained with extensive documentation, examples, and community support.
Local & Open-Source Model Support: Integrates with Ollama, llama-cpp-python, and vLLM to run structured extraction on locally hosted open-source models.

Cons

Pydantic Knowledge Required: Developers need familiarity with Pydantic and Python typing to fully leverage the library, which may present a learning curve for newcomers.
Not a Full Agent Framework: Instructor is focused on structured extraction, not agentic workflows. Teams needing built-in tool use, memory, or orchestration will need a complementary framework like PydanticAI.
Retry Costs Can Add Up: Automatic retries on validation failures can increase LLM API token consumption and costs in scenarios with complex or frequently failing schemas.

Frequently Asked Questions

Instructor is an open-source library that makes LLM outputs structured and reliable. Instead of parsing raw text from LLMs, you define a Pydantic schema and Instructor ensures the LLM returns data that matches it, with automatic retries if validation fails.

Instructor supports 15+ providers including OpenAI, Anthropic Claude, Google Gemini, Mistral, Cohere, Ollama, DeepSeek, and more. You can also use it with local open-source models via Ollama, llama-cpp-python, or vLLM.

Basic knowledge of Pydantic is recommended since schemas are defined as Pydantic models. However, Instructor's documentation includes beginner-friendly guides to help developers get up to speed quickly.

Instructor is available in Python, TypeScript, Go, Ruby, Elixir, and Rust, making it suitable for a wide variety of teams and tech stacks.

Instructor is optimized for fast, schema-first data extraction from LLMs without the overhead of agent orchestration. PydanticAI extends this with typed tools, observability dashboards, and dataset replays for full agentic workflows. The two are complementary.