Keywords: Motion generation, Latent diffusion model, Adversarial training, Guided generation
TL;DR: MoLA achieves fast and high-quality human motion generation given textual descriptions while enabling motion editing applications. With MoLA, we can deal with various types of motion editing tasks in a single framework.
Abstract: In text-to-motion generation, controllability as well as generation quality and speed has become increasingly critical.
The controllability challenges include generating a motion of a length that matches the given textual description and editing the generated motions according to control signals, such as the start-end positions and the pelvis trajectory.
In this paper, we propose MoLA, which provides fast, high-quality, variable-length motion generation and can also deal with multiple editing tasks in a single framework.
Our approach revisits the motion representation used as inputs and outputs in the model, incorporating an activation variable to enable variable-length motion generation.
Additionally, we integrate a variational autoencoder and a latent diffusion model, further enhanced through adversarial training, to achieve high-quality and fast generation.
Moreover, we apply a training-free guided generation framework to achieve various editing tasks with motion control inputs.
We quantitatively show the effectiveness of adversarial learning in text-to-motion generation, and demonstrate the applicability of our editing framework to multiple editing tasks in the motion domain.
Supplementary Material: zip
Submission Number: 9
Loading