Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation

Zikai Huang1, Xuemiao Xu1,3,4,*, Cheng Xu2*, Huaidong Zhang1, Chenxi Zheng1, Jing Qin2, Shengfeng He5
1South China University of Technology, 2The Hong Kong Polytechnic University, 3Guangdong Engineering Center for Large Model and GenAI Technology, 4Guangdong Provincial Key Lab of Computational Intelligence and Cyberspace Information, 5Singapore Management University
Best viewed with audio 🎧

Beat-synchronized keyframe-controlled dance generation with Beat-It.

To demonstrate the superiority of our method in beat-synchronized keyframe-controlled dance generation, we feed our model with the same music and beats with varying keyframes for dance generation.

Arbitrary beat-controlled dance generation with Beat-It.

To illustrate the flexible beat controllability of our method, we present an example where the beat condition is not strictly aligned with the music beats.

Abstract

Dance, as an art form, fundamentally hinges on the precise synchronization with musical beats. However, achieving aesthetically pleasing dance sequences from music is challenging, with existing methods often falling short in controllability and beat alignment.

To address these shortcomings, this paper introduces Beat-It, a novel framework for beat-specific, key pose-guided dance generation. Unlike prior approaches, Beat-It uniquely integrates explicit beat awareness and key pose guidance, effectively resolving two main issues: the misalignment of generated dance motions with musical beats, and the inability to map key poses to specific beats, critical for practical choreography. Our approach disentangles beat conditions from music using a nearest beat distance representation and employs a hierarchical multi-condition fusion mechanism. This mechanism seamlessly integrates key poses, beats, and music features, mitigating condition conflicts and offering rich, multi-conditioned guidance for dance generation. Additionally, a specially designed beat alignment loss ensures the generated dance movements remain in sync with the musical beats. Extensive experiments confirm Beat-It's superiority over existing state-of-the-art methods in terms of beat alignment and motion controllability.

Method

Comparison

We compare our method with the state-of-the-art methods EDGE, Bailando, and FACT on the AIST++ dataset. Note that only EDGE supports keyframe control while the other two methods do not.

Specifically, Bailando generates key points in Cartesian space, necessitating resource-intensive post-processing for animating 3D characters. To ensure an equitable evaluation, we opted for visualizing skeleton stick figures instead of using its original representation in our comparisons.

Ablation

Different Ratios of Keyframes Condition

Different Conditions Combinations

In the Wild

To illustrate the generalization ability of our method, we also evaluate our method on in-the-wild music videos. We randomly select different samples from the AIOZ-GDANCE dataset for evaluation. The left side showcases the input keyframes, while the generated outcomes are displayed on the right.