MagicDance

open_source

MagicDance uses identity-aware diffusion to transfer human poses and facial expressions onto any target in video — zero fine-tuning required. Open-source ICML 2024 research.

Video

Art Generators

AI Video Generators

About

MagicDance, formally known as MagicPose, is a cutting-edge AI research model developed by researchers at the University of Southern California and ByteDance, presented at ICML 2024. It enables realistic human dance video generation by transferring novel pose sequences and facial expressions onto any target identity, all without the need for additional fine-tuning on new data. At its core, MagicPose employs a two-stage training strategy: first pretraining an Appearance Control Block to disentangle human motion from appearance (e.g., facial expressions, skin tone, clothing), and then jointly fine-tuning an Appearance-Pose-Joint-Control Block using AnimateDiff for temporal consistency. The model is designed as a plug-in extension to Stable Diffusion, leaving the original pre-trained weights intact. Key capabilities include vivid and accurate human motion and facial expression transfer driven by pose skeletons and face landmarks, robust appearance control with consistent upper-body and background rendering, and zero-shot 2D animation generation — enabling cartoon-style stylization from only pose inputs. MagicPose generalizes well across unseen human identities and complex motion sequences, demonstrating superior performance on the TikTok benchmark dataset. Ideal for researchers, creative technologists, content creators, and developers working on generative video, virtual avatars, or animation pipelines, MagicDance represents a powerful and flexible building block for AI-driven human video synthesis.

Key Features

Identity-Preserving Motion Transfer: Transfers pose sequences and facial expressions onto a target identity while consistently preserving their appearance, skin tone, clothing, and facial attributes across frames.
Zero-Shot Generation: Generates realistic human dance videos for unseen identities and complex motion sequences without any additional fine-tuning or extra training data.
Stable Diffusion Plug-in: Designed as a modular extension to Stable Diffusion and ControlNet — integrates seamlessly without modifying the original model's pre-trained weights.
Two-Stage Disentangled Training: Employs a novel two-stage pipeline that separates appearance from motion, enabling robust control over both independently for higher-quality, temporally consistent outputs.
Zero-Shot 2D Animation: Extends beyond real humans to support cartoon-style animation generation from pose inputs alone, enabling creative stylization with no domain-specific training.

Use Cases

Animating a static photo of a person to perform a choreographed dance sequence using pose skeleton inputs.
Transferring facial expressions and body movements from a reference video onto a different target identity for content creation.
Generating zero-shot cartoon or anime character animations driven by human pose sequences without domain-specific training.
Building virtual avatar or digital human pipelines that require identity-consistent motion retargeting.
Conducting research in controllable video generation, human motion synthesis, and diffusion model extensions.

Pros

No Fine-Tuning Required: Achieves strong generalization across unseen identities and motion sequences out of the box, significantly reducing the barrier to use in production pipelines.
Modular & Non-Destructive: Operates as a plug-in to existing Stable Diffusion setups, leaving pre-trained weights untouched and making it easy to integrate into existing workflows.
Peer-Reviewed Research: Published at ICML 2024 and validated on the TikTok dataset, providing academic credibility and benchmark-verified performance.
Supports Animation Stylization: Beyond realistic video, it can handle cartoon and animated references, broadening its creative application range significantly.

Cons

Research-Stage Maturity: As an academic research project, it may lack production-ready tooling, documentation, and support compared to commercial video AI platforms.
Compute-Intensive: Diffusion-based video generation requires significant GPU resources, making it less accessible for users without high-performance hardware.
Limited to 2D Human Video: The model is specifically designed for 2D human dance and expression transfer; it does not generalize to arbitrary objects, scenes, or 3D content.

Frequently Asked Questions

MagicDance, also known as MagicPose, is an open-source diffusion-based AI model that transfers human poses and facial expressions from a reference sequence onto a target identity in video, without requiring any fine-tuning. It was published at ICML 2024 by researchers from USC and ByteDance.

No. MagicDance is designed for zero-shot generalization, meaning it can generate identity-consistent videos for unseen individuals and novel motion sequences without any additional training data or fine-tuning steps.

MagicDance is built as a plug-in extension to Stable Diffusion and ControlNet. It adds an Appearance Control Block and a Pose ControlNet on top of the frozen Stable Diffusion UNet, so the original model weights are never modified.

Yes. MagicDance supports zero-shot 2D animation generation, enabling pose-driven stylization from cartoon or non-photorealistic reference images, even though the model was trained exclusively on real human dance videos.

MagicDance was extensively evaluated on the TikTok dataset, where it demonstrated superior performance compared to prior methods such as TPS and DisCo in both quantitative metrics and qualitative comparisons.