TensorFlow Lite

open_source

LiteRT (formerly TensorFlow Lite) is Google's open-source framework for deploying ML and GenAI models on Android, iOS, web, desktop, and IoT devices with GPU/NPU acceleration.

AI Models & Infrastructure

LLM Developer Tools

AI Frameworks

About

LiteRT, rebranded from TensorFlow Lite, is Google's next-generation on-device AI runtime designed to deploy ML and generative AI models across billions of edge devices without relying on cloud connectivity. Part of the Google AI Edge ecosystem, it provides a unified, high-performance framework supporting Android, iOS, macOS, web (via LiteRT.js), Linux, Windows, and embedded/IoT platforms. The framework enables hardware acceleration through GPU and NPU delegates, with dedicated support for Qualcomm, MediaTek, and Google Tensor chipsets, ensuring low-latency, power-efficient inference. Developers can convert existing PyTorch and TensorFlow models using built-in conversion tools and optimize them with post-training quantization techniques including dynamic range, integer (int8), float16, and int16-activation quantization. For generative AI workloads, LiteRT-LM provides a dedicated runtime for running large language models locally on mobile and edge hardware. The Model Explorer and AI Edge Portal tooling streamline model inspection and deployment pipelines. Multiple language APIs are supported—Kotlin, Java, Swift, C++, Python, and JavaScript—making LiteRT accessible across mobile, native, and web development stacks. It is ideal for mobile developers, embedded engineers, and AI researchers building privacy-first applications where data must stay on-device. As a fully open-source project with strong Google backing and a large community, LiteRT is one of the most widely adopted edge AI frameworks in production today.

Key Features

On-Device Inference: Run ML and GenAI models locally on mobile, desktop, and IoT devices without requiring cloud connectivity, ensuring low latency and privacy.
GPU & NPU Hardware Acceleration: Leverage GPU and NPU delegates for fast, power-efficient inference on Qualcomm, MediaTek, and Google Tensor chipsets.
Model Quantization & Optimization: Reduce model size and improve performance using post-training quantization techniques including dynamic range, int8, float16, and int16 activation quantization.
LiteRT-LM for Large Language Models: A dedicated runtime for deploying and running LLMs on-device, enabling generative AI features in mobile and edge applications.
Broad Multi-Platform API Support: Provides APIs in Kotlin, Java, Swift, C++, Python, and JavaScript for seamless integration across Android, iOS, web, Linux, Windows, and embedded systems.

Use Cases

Running real-time image classification, object detection, and segmentation models on Android and iOS mobile apps without internet access.
Deploying on-device NLP and text processing models for translation, sentiment analysis, or smart reply features with full user privacy.
Running large language models locally on smartphones for generative AI features such as summarization or Q&A using LiteRT-LM.
Embedding ML inference into IoT and embedded Linux devices such as smart cameras, industrial sensors, or robotics platforms.
Building privacy-preserving AI applications in healthcare, finance, or enterprise where sensitive data must remain on-device and never leave the hardware.

Pros

Comprehensive Platform Coverage: Supports virtually every major platform — Android, iOS, web, macOS, Windows, Linux, and embedded IoT — from a single framework.
Fully Open Source with Google Backing: Free to use under an open-source license with active Google development, a large community, and extensive documentation.
Powerful Hardware Acceleration: Native NPU and GPU delegate support enables production-grade inference speeds on modern mobile hardware.
Flexible Model Compatibility: Convert models from PyTorch and TensorFlow, with quantization and optimization tools to meet edge hardware constraints.

Cons

Steep Learning Curve: Converting, quantizing, and deploying models requires solid ML engineering knowledge, making it challenging for beginners.
Quantization Accuracy Trade-offs: Aggressive model compression techniques can introduce accuracy degradation that requires careful tuning and validation.
Evolving GenAI Support: LLM and generative AI deployment on-device is a newer addition and still maturing compared to the well-established traditional ML inference pipeline.

Frequently Asked Questions

LiteRT is the rebranded name for TensorFlow Lite. Google renamed it to reflect its evolution into a broader, next-generation on-device AI runtime that supports both traditional ML and modern generative AI workloads.

LiteRT supports Android, iOS, macOS, web (via LiteRT.js), Linux, Windows, and embedded/IoT devices, with platform-specific APIs in Kotlin, Java, Swift, C++, Python, and JavaScript.

Yes. LiteRT-LM is a dedicated runtime within the LiteRT ecosystem for deploying and running LLMs on-device, enabling generative AI features in mobile applications with hardware acceleration support.

LiteRT provides conversion tools for both PyTorch and TensorFlow models. You can use the LiteRT converter to produce an optimized model, then apply post-training quantization to reduce size and improve inference speed.

Yes. LiteRT is fully open source and free to use for any purpose, including commercial applications. It is maintained by Google and the open-source community.