RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation

Mirgahney Mohmed, Jake Cunningham, Marc P. Deisenroth, Lourdes Agapito

University College London

Paper (Coming Soon) arXiv (Coming Soon) Video (Coming Soon) GitHub (Coming Soon) Colab (Coming Soon) 🤗 Hugging Face (Coming Soon)

Abstract

Human motion generation has paramount importance in computer animation. It is a challenging generative temporal modelling task due to the vast possibilities of human motion, high human sensitivity to motion coherence and the difficulty of accurately generating fine-grained motions. Recently, diffusion methods have been proposed for human motion generation due to their high sample quality and expressiveness. However, generated sequences still suffer from motion incoherence, and are limited to short duration, and simpler motion and take considerable time during inference. To address these limitations, we propose RecMoDiffuse: Recurrent Flow Diffusion, a new recurrent diffusion formulation for temporal modelling. Unlike previous work, which applies diffusion to the whole sequence without any temporal dependency, an approach that inherently makes temporal consistency hard to achieve, Our method explicitly enforces temporal constraints with the means of normalizing flow models in the diffusion process and thereby extends diffusion to the temporal dimension. We demonstrate the effectiveness of RecMoDiffuse in the temporal modelling of human motion. Our experiments show that RecMoDiffuse achieves comparable results with state-of-the-art methods while generating coherent motion sequences and reducing the computational overhead in the inference stage.

Train & Inference

Text-driven Motion Generation

Quantitative Results

Qualitative Results

Inference Time Comparison

Quantitative Results

BibTeX

@article{mohamed2024recmodiffuse,
        title={RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation},
        author={Mohamed, Mirgahney and Cunningham, Harry Jake and Deisenroth, Marc P and Agapito, Lourdes},
        journal={arXiv preprint arXiv:2406.07169},
        year={2024}}

Acknowledgement

The research presented here has been supported by the UCL Centre for Doctoral Training in Foundational AI under UKRI grant number EP/S021566/1. The authors are also grateful to Baskerville Tier 2 HPC service (https://www.baskerville.ac.uk/); funded by the EPSRC and UKRI through the World Class Labs scheme (EP/T022221/1) and the Digital Research Infrastructure programme (EP/W032244/1) and is operated by Advanced Research Computing at the University of Birmingham. We thank Shalini Maiti, Abdallah Basheir, Wonbong Jang, Oscar Key, and Waleed Dawood for their fruitful discussions and useful feedback.

We referred to the project page of Nerfies when creating this project page.