Mistral on Edge

Mistral on Edge

paid

Mistral's Ministral 3B and 8B are state-of-the-art edge language models for on-device, privacy-first AI inference with 128k context and low-latency performance.

About

Mistral on Edge presents les Ministraux — Ministral 3B and Ministral 8B — two state-of-the-art small language models purpose-built for edge and on-device deployments. Released on the first anniversary of Mistral 7B, these models set a new benchmark in the sub-10B category across knowledge, commonsense reasoning, function-calling, and inference efficiency. Ministral 3B and Ministral 8B both support up to 128k context length and are designed for latency-sensitive, privacy-first applications. Ministral 8B additionally features a special interleaved sliding-window attention pattern for faster, memory-efficient inference. Both models outperform peers such as Gemma 2 and Llama 3.2 in their respective size classes. Key use cases include on-device translation, internet-less smart assistants, local analytics, and autonomous robotics. The models are also designed to serve as efficient intermediaries in multi-step agentic workflows alongside larger models like Mistral Large — handling input parsing, task routing, and API calls at extremely low latency and cost. Both models are available via Mistral's API (la Plateforme) at $0.04/M and $0.1/M tokens respectively, and Ministral 8B weights are available for research use. They are ideal for developers, enterprises, and researchers seeking compute-efficient AI at the edge.

Key Features

  • Sub-10B State-of-the-Art Performance: Ministral 3B and 8B consistently outperform competitors like Gemma 2 and Llama 3.2 across knowledge, reasoning, and function-calling benchmarks in their size class.
  • 128k Context Length: Both models support up to 128k tokens of context, enabling long document understanding and complex multi-turn conversations on edge devices.
  • Memory-Efficient Inference: Ministral 8B features an interleaved sliding-window attention pattern for faster and more memory-efficient inference, ideal for resource-constrained environments.
  • Agentic Workflow Integration: Les Ministraux serve as efficient intermediaries in multi-step agentic pipelines, handling input parsing, task routing, and API calls alongside larger models.
  • Privacy-First On-Device Deployment: Designed for local, offline-capable inference supporting critical applications like on-device translation, internet-less assistants, and autonomous robotics.

Use Cases

  • On-device translation for offline or privacy-sensitive mobile and desktop applications
  • Internet-less smart assistants running locally on edge hardware without cloud dependency
  • Local data analytics and business intelligence on-premise without sending data to the cloud
  • Autonomous robotics requiring low-latency, real-time decision-making at the edge
  • Efficient task routing and API orchestration in multi-step agentic AI workflows alongside larger models

Pros

  • Best-in-Class Edge Performance: Outperforms competing models such as Llama 3.2 3B and Gemma 2 across multiple benchmarks while staying within the sub-10B parameter budget.
  • Ultra-Low API Pricing: At $0.04/M tokens for Ministral 3B and $0.1/M for Ministral 8B, these models offer exceptional cost efficiency for high-volume or latency-sensitive workloads.
  • Research Access to Model Weights: Ministral 8B Instruct weights are available for research use, enabling fine-tuning and experimentation without full commercial licensing.
  • Long Context Support: 128k context length is rare in the sub-10B space and enables sophisticated use cases like document summarization and multi-turn agentic tasks.

Cons

  • Commercial License Required for Self-Deployment: Using the models in production outside of la Plateforme requires a commercial license, which must be negotiated directly with Mistral AI.
  • Limited vLLM Context Window: On vLLM, the context length is currently capped at 32k rather than the full 128k, which may be limiting for open-source self-hosted deployments.
  • Ministral 3B Weights Not Publicly Available: Unlike Ministral 8B, the Ministral 3B model weights are not released for research use, restricting fine-tuning and local experimentation.

Frequently Asked Questions

What are Ministral 3B and Ministral 8B?

They are two state-of-the-art small language models from Mistral AI, optimized for on-device and edge computing. They excel in reasoning, knowledge, function-calling, and efficiency within the sub-10B parameter class.

What is the pricing for les Ministraux via the API?

Ministral 3B is priced at $0.04 per million tokens (input and output), and Ministral 8B at $0.1 per million tokens, both available on Mistral's la Plateforme.

Can I download and self-host these models?

Ministral 8B Instruct weights are available for research use. For commercial self-deployment of either model, you need to contact Mistral AI for a commercial license.

What context length do these models support?

Both models support up to 128k context length. Note that on vLLM, context is currently limited to 32k tokens.

What are the primary use cases for les Ministraux?

They are designed for on-device translation, internet-less smart assistants, local analytics, autonomous robotics, and as efficient function-calling intermediaries in multi-step agentic workflows.

Reviews

No reviews yet. Be the first to review this tool.

Alternatives

See all