DressRecon: Freeform 4D Human Reconstruction from Monocular Video

Published: 23 Mar 2025, Last Modified: 24 Mar 20253DV 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Neural Rendering, 4D Reconstruction, Dynamic Avatars
TL;DR: We reconstruct freeform 4D humans from monocular videos in the wild, focusing on loose clothing and handheld objects, by designing a video-specific articulated bag-of-bones deformation model and making use of image-based priors.
Abstract: We present a method to reconstruct time-consistent human body models from monocular videos, focusing on extremely loose clothing or handheld object interactions. Prior work in human reconstruction is either limited to tight-fitting clothing with no object interactions, or requires calibrated multi-view captures or personalized template scans which are costly to collect at scale. Our key insight for high-quality yet flexible reconstruction is the careful combination of generic human priors about articulated body shape (learned from large-scale training data) with video-specific articulated "bag-of-bones" deformation (fit to a single video via test-time optimization). We accomplish this by learning a neural implicit model that disentangles body versus clothing deformations as separate motion model layers. To capture subtle geometry of clothing, we leverage image-based priors such as human body pose, surface normals, and optical flow during optimization. The resulting neural fields can be extracted into time-consistent meshes, or further refined as explicit 3D gaussians for high-fidelity interactive rendering. On datasets with highly challenging clothing deformations and object interactions, DressRecon yields higher-fidelity 3D reconstructions than prior art.
Supplementary Material: zip
Submission Number: 395
Loading