GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images

CVPR 2023

1Dalian University of Technology, China 2ZMO AI Inc.

Given sparse calibrated multi-view images of an arbitrary performer and the corresponding registered SMPL, GM-NeRF builds the generalizable model-based neural human radiance field for novel view synthesis.

GM-NeRF Teasar

Compared with other generalizable human NeRFs, GM-NeRF still yields a reasonable result while the SMPL estimation is imprecise.

Abstract

In this work, we focus on synthesizing high-fidelity novel view images for arbitrary human performers, given a set of sparse multi-view images. It is a challenging task due to the large variation among articulated body poses and heavy self-occlusions. To alleviate this, we introduce an effective generalizable framework Generalizable Model-based Neural Radiance Fields (GM-NeRF) to synthesize free-viewpoint images. Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy which can alleviate the misalignment between inaccurate geometry prior and pixel space. On top of that, we further conduct neural rendering and partial gradient backpropagation for efficient perceptual supervision and improvement of the perceptual quality of synthesis. To evaluate our method, we conduct experiments on synthesized datasets THuman2.0 and Multi-garment, and real-world datasets Genebody and ZJUMocap. The results demonstrate that our approach outperforms state-of-the-art methods in terms of novel view synthesis and geometric reconstruction.

Video

Framework Overview

GM-NeRF architecture.

Given m calibrated multi-view images and registered SMPL, we build the generalizable model-based neural human radiance field. First, we utilize the image encoder to extract multi-view image features, which are used to provide geometric and appearance information, respectively. In order to adequately exploit the geometric prior, we propose the visibility-based attention mechanism to construct a structured geometric body embedding, which is further diffused to form a geometric feature volume. For any spatial point x, we trilinearly interpolate the feature volume G to obtain the geometric code g(x). In addition, we also propose geometry-guided attention to obtain the appearance code a(x, d) directly from the multi-view image features. We then feed the geometric code g(x) and appearance code a(x, d) into the MLP network to build the neural feature field (f, σ) = F(g(x), a(x, d)). Finally, we employ volume rendering and neural rendering to generate the novel view image.


Novel View Synthesis


3D Reconstruction

Comparsion_Geometry.

Demos

BibTeX

@inproceedings{chen2023gmnerf,
        author = {Jianchuan Chen and
            Wentao Yi and
            Liqian Ma and
            Xu Jia and
            Huchuan Lu},
        title = {GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields
            from Multi-view Images},
        booktitle = {CVPR},
        year = {2023}
        }