About
Gorilla LLM is an open-source research project from UC Berkeley that focuses on connecting large language models with real-world APIs and services. At its core, Gorilla OpenFunctions v2 is a 6.91B parameter model fine-tuned for tool use, capable of generating parallel function calls (multiple outputs simultaneously) and selecting among multiple candidate functions. It natively supports Python, Java, JavaScript, and REST API calling with extended data types. The project also maintains the Berkeley Function-Calling Leaderboard (BFCL), a comprehensive benchmark of over 2,000 question-function-answer pairs designed to rigorously evaluate how well different LLMs handle function calling across diverse domains and complexity levels, including function relevance detection. RAFT (Retriever-Aware Fine-Tuning) introduces a novel recipe for fine-tuning base LLMs specifically for domain-specific RAG scenarios, drawing an analogy between open-book and closed-book exam preparation to produce more robust retrieval-augmented models. GoEX (Gorilla Execution Engine) provides a runtime for safely executing LLM-generated code and API calls, featuring "post-facto validation," undo abstractions, and damage confinement to manage unintended actions in autonomous LLM pipelines. Licensed under Apache 2.0, Gorilla can be used commercially with no obligations. It is accessible via a web demo, CLI tool (pip install gorilla-cli), HuggingFace models, and is ideal for developers, researchers, and enterprises building LLM-powered agents and API-integrated applications.
Key Features
- Gorilla OpenFunctions v2: A 6.91B parameter open-source model that supports parallel function calls, multiple function selection, and natively handles Python, Java, JavaScript, and REST APIs with extended data types.
- Berkeley Function-Calling Leaderboard (BFCL): A rigorous benchmark of 2,000+ question-function-answer pairs evaluating LLM function-calling performance across multiple languages, domains, and complexity levels including function relevance detection.
- RAFT: Retriever-Aware Fine-Tuning: A domain-specific fine-tuning recipe that prepares LLMs to excel at RAG tasks by training them to reason over specific document sets, outperforming standard supervised fine-tuning approaches.
- GoEX Execution Engine: A runtime for safely executing LLM-generated code and API calls, featuring post-facto validation, undo abstractions, and damage confinement for autonomous LLM agent pipelines.
- CLI & Spotlight Integration: Gorilla-powered command-line tool (pip install gorilla-cli) and Spotlight Search integration allow users to invoke API-backed commands directly from their terminal or desktop search.
Use Cases
- Building AI agents that autonomously interact with external APIs and cloud services without hand-crafted prompt engineering
- Evaluating and comparing LLM function-calling capabilities using the Berkeley Function-Calling Leaderboard (BFCL) benchmark
- Fine-tuning LLMs for domain-specific RAG applications using the RAFT methodology to improve retrieval-grounded answer quality
- Creating CLI tools and developer utilities that leverage LLM-powered API invocation directly from the terminal
- Safely deploying autonomous LLM pipelines with GoEX, using undo and damage confinement abstractions to reduce the risk of unintended side effects
Pros
- Apache 2.0 Licensed for Commercial Use: Fully open-source with a permissive license, allowing developers and businesses to use Gorilla in commercial products without restrictions or licensing fees.
- GPT-4-Level Function Calling Performance: Gorilla OpenFunctions v2 matches GPT-4 on function-calling benchmarks while being open-source and self-hostable, providing a cost-effective alternative.
- Multi-Language API Support: Uniquely supports Python, Java, JavaScript, and REST API function generation in a single model, making it highly versatile for diverse backend environments.
- Comprehensive Ecosystem: Goes beyond just a model — BFCL, RAFT, and GoEX form a full suite of tools for evaluating, training, and safely deploying LLM-powered API integrations.
Cons
- Requires Technical Expertise: Primarily a research and developer-oriented project; setting up, fine-tuning, and deploying the model requires significant ML and engineering knowledge.
- Narrow Focus on Function Calling: Gorilla is optimized for API and function-calling tasks rather than general-purpose conversational AI, limiting its applicability for broader chatbot use cases.
- Self-Hosting Infrastructure Needed: Full capabilities require self-hosting the model, which demands GPU infrastructure and operational overhead not needed with managed API services.
Frequently Asked Questions
Gorilla LLM is an open-source large language model from UC Berkeley designed to accurately invoke API calls across thousands of services. It specializes in function calling and tool use, supporting Python, Java, JavaScript, and REST APIs.
BFCL is a benchmark created by the Gorilla team to evaluate how well different LLMs perform on function-calling tasks. It consists of over 2,000 question-function-answer pairs across multiple programming languages, application domains, and complexity levels.
RAFT (Retriever-Aware Fine-Tuning) is a fine-tuning methodology that trains LLMs to better handle domain-specific Retrieval-Augmented Generation (RAG) tasks. By teaching models to reason over specific document sets — like students studying for an open-book exam — RAFT produces more accurate and grounded answers than standard fine-tuning.
GoEX (Gorilla Execution Engine) is a runtime for executing LLM-generated actions such as code and API calls. It introduces "post-facto validation," undo abstractions, and damage confinement to safely manage and roll back unintended actions in autonomous LLM agent systems.
Yes. Gorilla LLM is licensed under the Apache 2.0 license, which permits commercial use without any obligations. You can deploy and build products with it freely.