Last updated on September 19, 2024
Traditional models often struggle with generating realistic motion for complex or unfamiliar prompts, but MoRAG solves this by retrieving motion sequences for specific body parts, then fusing them into a cohesive full-body sequence. MoRAG (Multi-Fusion Retrieval Augmented Generation) is a framework designed to improve text-based human motion generation by enhancing motion diffusion models with a retrieval-augmented approach.
MoRAG breaks down prompts into motion sequences specific to body parts (like the torso, hands, and legs), and retrieves motions tailored to each part. These parts are then combined to generate a realistic, diverse motion sequence from natural language descriptions.
If a user inputs:
A person doing yoga.
MoRAG retrieves specific motions for the torso, arms, and legs from its database, and combines them to generate a realistic yoga pose sequence.
MoRAG enhances motion generation by combining retrieval-based techniques with diffusion models. The process involves:
MoRAG can be used in animation, virtual reality, and gaming for generating complex, realistic human motions from simple text descriptions.
To explore MoRAG further, check out the code and models released by the authors.
Shashank, K. S., Maheshwari, S., & Sarvadevabhatla, R. K. (2024). MoRAG – Multi-Fusion Retrieval Augmented Generation for Human Motion. https://arxiv.org/abs/2409.12140 ↩