LiteRT

free

Deploy ML and GenAI models on billions of edge devices with LiteRT, Google's high-performance on-device AI framework with GPU/NPU acceleration.

AI Models & Infrastructure

LLM Developer Tools

AI Frameworks

About

LiteRT (formerly TensorFlow Lite) is Google's unified, high-performance framework for on-device machine learning and generative AI inference. Designed to run on billions of edge devices, LiteRT allows developers to deploy ML and GenAI models directly on Android, iOS, macOS, Windows, Linux, web browsers, and embedded/IoT hardware—without requiring a cloud connection. The framework supports GPU and NPU hardware acceleration across major chipsets including Qualcomm, MediaTek, and Google Tensor, enabling fast, efficient inference on mobile and edge devices. LiteRT-LM extends the framework specifically for deploying large language models on-device, making private and low-latency AI applications possible. Developers can convert and optimize models from popular frameworks including PyTorch, TensorFlow, and JAX. Post-training quantization techniques—including dynamic range, integer, and float16 quantization—help reduce model size and improve performance on constrained hardware. LiteRT is ideal for AI engineers building privacy-preserving applications, developers targeting mobile and embedded platforms, and teams seeking to reduce inference latency and cloud costs. With tools like Model Explorer, the AI Edge Portal, and MediaPipe integration, LiteRT provides a comprehensive ecosystem for end-to-end on-device AI development.

Key Features

On-Device ML & GenAI Inference: Run machine learning and generative AI models directly on edge devices without requiring a cloud connection, enabling low-latency and privacy-preserving applications.
LiteRT-LM for On-Device LLMs: A dedicated runtime for deploying large language models on mobile and edge hardware, supporting chipsets from Qualcomm, MediaTek, and Google Tensor.
GPU & NPU Hardware Acceleration: Leverage hardware accelerators across Android, iOS, and desktop platforms to maximize inference speed and energy efficiency on supported devices.
Multi-Framework Model Conversion: Convert and optimize models from PyTorch, TensorFlow, and JAX, with post-training quantization options to reduce model size and boost performance.
Broad Platform Support: Deploy across Android, iOS, macOS, Windows, Linux, Web (via LiteRT.js), and embedded/IoT devices from a single unified framework.

Use Cases

Deploying real-time image classification or object detection models in mobile apps without cloud connectivity.
Running on-device large language models for private AI assistants, text summarization, or code completion on Android and iOS devices.
Enabling low-latency AI inference on IoT and embedded devices such as smart cameras, robotics, and wearables.
Building offline-capable AI features in consumer apps to reduce cloud costs and improve response times.
Optimizing and quantizing existing TensorFlow or PyTorch models for edge deployment across diverse hardware platforms.

Pros

Massive Device Coverage: Designed to run on billions of devices across all major platforms and chipsets, making it one of the most widely deployable ML runtimes available.
Privacy-Preserving Inference: On-device execution means user data never leaves the device, enabling AI features in sensitive applications without cloud data exposure.
Comprehensive Acceleration Support: First-class GPU and NPU acceleration across Qualcomm, MediaTek, and Google Tensor chipsets reduces latency and power consumption significantly.
Rich Ecosystem & Tooling: Integrates with MediaPipe, Model Explorer, and the AI Edge Portal, providing end-to-end tooling for model development, conversion, and deployment.

Cons

Model Size & Complexity Constraints: Edge deployment requires models to be optimized and quantized for constrained hardware, which can reduce accuracy or limit model complexity compared to cloud-based inference.
Hardware Acceleration Setup Complexity: Enabling GPU or NPU delegates requires platform-specific configuration and testing across different device chipsets, which can be time-consuming.
Migration Effort from TensorFlow Lite: Teams with existing TFLite integrations may need to update APIs and tooling to fully migrate to the LiteRT ecosystem despite backward compatibility efforts.

Frequently Asked Questions

LiteRT is the rebranded and evolved successor to TensorFlow Lite (TFLite). It retains TFLite's on-device inference capabilities while adding support for generative AI models, improved hardware acceleration, and a broader platform ecosystem including LiteRT-LM for LLMs.

Yes. LiteRT includes LiteRT-LM, a specialized runtime for deploying LLMs on-device. It supports chipsets from Qualcomm, MediaTek, and Google Tensor, enabling private, low-latency LLM inference directly on mobile and edge hardware.

LiteRT supports Android, iOS, macOS, Windows, Linux, web browsers (via LiteRT.js), and embedded/IoT devices. It provides C++, Java/Kotlin, Swift, Python, and JavaScript APIs depending on the target platform.

LiteRT supports model conversion from PyTorch, TensorFlow, and JAX. Models can be optimized using post-training quantization techniques including dynamic range, integer (int8), float16, and int16 activation quantization.

Yes. LiteRT is free and open-source, developed by Google and available to developers at no cost. It is part of the Google AI Edge ecosystem and can be accessed via ai.google.dev.