Clips AI

open_source

Clips AI is a free, open-source Python library that automatically converts long-form videos into short clips using transcript analysis and dynamically reframes aspect ratios for social media.

Coding & Development

Video Editors

Short Form Video Tools

About

Clips AI is an open-source Python library built for developers who need to automate the repurposing of long-form video content into shorter, shareable clips. With just a few lines of code, it handles the full pipeline: transcription, clip detection, and intelligent video resizing. The library uses WhisperX — an enhanced wrapper around OpenAI Whisper — to transcribe audio and detect precise word-level timestamps. Its ClipFinder algorithm then analyzes the transcript to identify the most compelling segments, outputting start and end times for each clip. This approach is optimized for audio-centric, narrative-driven content such as podcasts, interviews, speeches, and sermons. For resizing, Clips AI integrates with Pyannote for speaker diarization, enabling it to dynamically reframe the video to focus on whoever is currently speaking. This makes it straightforward to convert horizontal (16:9) video into vertical (9:16) format suitable for TikTok, Instagram Reels, or YouTube Shorts. Clips AI is designed for developers and engineers looking to build video repurposing pipelines or content automation tools. It is available on GitHub, installable via pip, and free to use under an open-source license. The library requires Python, ffmpeg, and libmagic as system dependencies.

Key Features

Transcript-Based Clip Detection: Analyzes video transcripts using WhisperX to automatically identify and extract the most meaningful segments, outputting precise start and end timestamps.
Dynamic Speaker Reframing: Uses Pyannote speaker diarization to intelligently reframe video around the active speaker when converting aspect ratios (e.g., 16:9 to 9:16).
Aspect Ratio Conversion: Resizes video to any target aspect ratio, making it easy to prepare content for vertical social media formats like TikTok, Reels, and Shorts.
Simple Python API: Provides a clean, minimal API installable via pip — clip finding and resizing can each be accomplished in just a few lines of code.
WhisperX Transcription Integration: Leverages WhisperX for high-accuracy transcription with word-level timestamps, which powers the clipping algorithm.

Use Cases

Automatically repurpose full podcast episodes into short highlight clips for distribution on YouTube Shorts, TikTok, or Instagram Reels.
Build an automated video processing pipeline that transcribes, clips, and resizes interview footage with minimal human intervention.
Convert conference talk recordings or keynote speeches into bite-sized clips for social media marketing.
Reframe existing 16:9 YouTube content into 9:16 vertical videos for mobile-first platforms without manual editing.
Integrate into a content automation SaaS product to offer AI-powered video clipping as a feature for creators.

Pros

Fully Open Source: Available on GitHub with no licensing cost, making it accessible for individual developers and teams building automated video pipelines.
Minimal Code Required: The entire clip detection and resizing workflow can be implemented in under 10 lines of Python, lowering the barrier to automation.
Smart Speaker-Aware Reframing: Automatically tracks and centers the active speaker during resizing, producing professional-looking vertical video without manual editing.

Cons

Developer-Only Tool: Requires Python knowledge and manual environment setup (pip, ffmpeg, libmagic); there is no graphical or no-code interface.
Dependency Complexity: Relies on multiple external dependencies including WhisperX, Pyannote, ffmpeg, and libmagic, which can make installation and version management cumbersome.
Narrative Video Focus: The transcript-driven clipping algorithm is optimized for speech-heavy content and may underperform on music videos, sports, or other non-narrative formats.

Frequently Asked Questions

Clips AI is designed for audio-centric, narrative-based videos such as podcasts, interviews, speeches, and sermons, where transcript analysis can identify meaningful segments.

Yes, Clips AI is fully open source and free to use. It is installable via pip and hosted on GitHub.

Yes, video resizing requires a free Hugging Face access token to use Pyannote for speaker diarization. You will not be charged for using Pyannote.

Clips AI supports conversion to any target aspect ratio. A common use case is converting 16:9 horizontal video to 9:16 vertical format for social media platforms.

The ClipFinder algorithm analyzes the video's transcript — generated by WhisperX — to identify naturally coherent segments and determine optimal clip boundaries.