Neural Capture & Synthesis

The main theme of my work is to capture and to (re-)synthesize the real world using commodity hardware. It includes the modeling of the human body, tracking, as well as the reconstruction and interaction with the environment. The digitization is needed for various applications in AR/VR as well as in movie (post-)production. Teleconferencing and working in VR is of high interest for many companies ranging from social media platforms to car manufacturer. It enables the remote interaction in VR, e.g., the inspection of 3D content like CAD models or scans from real objects. A realistic reproduction of appearances and motions is key for such applications. Capturing natural motions and expressions as well as the photorealistic reproduction of images under novel views are challenging. With the rise of deep learning methods and, especially, neural rendering, we see immense progress to succeed in these challenges. The goal of my work is to develop methods for AI-based image synthesis of humans, the underlying representation of appearance, geometry and motion to allow for explicit and implicit control over the synthesis process. My work on 3D reconstruction, tracking and rendering does not focus exclusively on humans but also on the environment and objects we interact with, thus, enabling applications like 3d telepresence or collaborative working in VR. In both areas, reconstruction and rendering, hybrid approaches that combine novel findings in machine learning with classical computer graphics and computer vision approaches show promising results. Nevertheless, these methods still suffer from limitations like generalizability, controllability and editability which I will tackle in my ongoing and future work.


SIGGRAPH 2021: Course on the Advances in Neural Rendering

This course covers the advances in neural rendering over the years 2020-2021.

1 minute read     [Video]  [Bibtex] 

TransformerFusion: Monocular RGB Scene Reconstruction using Transformers

We introduce TransformerFusion, a transformer-based 3D scene reconstruction approach. From an input monocular RGB video, the video frames are processed by a transformer network that fuses the observations into a volumetric feature grid representing the scene; this feature grid is then decoded into an implicit 3D scene representation.

1 minute read     [Paper]  [Video]  [Bibtex] 

NerFACE: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction

We present dynamic neural radiance fields for modeling the appearance and dynamics of a human face. To handle the dynamics of the face, we combine our scene representation network with a low-dimensional morphable model which provides explicit control over pose and expressions. We use volumetric rendering to generate images from this hybrid representation and demonstrate that such a dynamic neural scene representation can be learned from monocular input data only, without the need of a specialized capture setup.

1 minute read     [Paper]  [Video]  [Bibtex] 

Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction

We introduce Neural Deformation Graphs for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects. Specifically, we implicitly model a deformation graph via a deep neural network. This neural deformation graph does not rely on any object-specific structure and, thus, can be applied to general non-rigid deformation tracking.

1 minute read     [Paper]  [Video]  [Bibtex] 

Neural Non-Rigid Tracking

We introduce a novel, end-to-end learnable, differentiable non-rigid tracker that enables state-of-the-art non-rigid reconstruction. By enabling gradient back-propagation through a non-rigid as-rigid-as-possible optimization solver, we are able to learn correspondences in an end-to-end manner such that they are optimal for the task of non-rigid tracking.

1 minute read     [Paper]  [Video]  [Bibtex] 

CVPR 2020: Tutorial on Neural Rendering

Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. This state-of-the-art report summarizes the recent trends and applications of neural rendering.

1 minute read     [Paper]  [Video]  [Bibtex] 

State of the Art on Neural Rendering

Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. This state-of-the-art report summarizes the recent trends and applications of neural rendering.

2 minute read     [Paper]  [Bibtex] 

Adversarial Texture Optimization from RGB-D Scans

We present a novel approach for color texture generation using a conditional adversarial loss obtained from weakly-supervised views. Specifically, we propose an approach to produce photorealistic textures for approximate surfaces, even from misaligned images, by learning an objective function that is robust to these errors.

1 minute read     [Paper]  [Video]  [Bibtex] 

Deferred Neural Rendering:
Image Synthesis using Neural Textures

Deferred Neural Rendering is a new paradigm for image synthesis that combines the traditional graphics pipeline with learnable Neural Textures. Both neural textures and deferred neural renderer are trained end-to-end, enabling us to synthesize photo-realistic images even when the original 3D content was imperfect.

2 minute read     [Paper]  [Video]  [Bibtex] 

DeepVoxels: Learning Persistent 3D Feature Embeddings

In this work, we address the lack of 3D understanding of generative neural networks by introducing a persistent 3D feature embedding for view synthesis. To this end, we propose DeepVoxels, a learned representation that encodes the view-dependent appearance of a 3D object without having to explicitly model its geometry.

1 minute read     [Paper]  [Video]  [Bibtex] 

Research Highlight: Face2Face

Research highlight of the Face2Face approach featured on the cover of Communications of the ACM in January 2019. Face2Face is an approach for real-time facial reenactment of a monocular target video. The method had significant impact in the research community and far beyond; it won several wards, e.g., Siggraph ETech Best in Show Award, it was featured in countless media articles, e.g., NYT, WSJ, Spiegel, etc., and it had a massive reach on social media with millions of views.

1 minute read     [Paper]  [Video]  [Bibtex] 

HeadOn: Real-time Reenactment of Human Portrait Videos

HeadOn is the first real-time reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel reenactment algorithm employs this proxy to map the captured motion from the source to the target actor.

1 minute read     [Paper]  [Video]  [Bibtex] 

Deep Video Portraits

Our novel approach enables photo-realistic re-animation of portrait videos using only an input video. The core of our approach is a generative neural network with a novel space-time architecture. The network takes as input synthetic renderings of a parametric face model, based on which it predicts photo-realistic video frames for a given target actor.

1 minute read     [Paper]  [Video]  [Bibtex] 

InverseFaceNet: Deep Monocular Inverse Face Rendering

We introduce InverseFaceNet, a deep convolutional inverse rendering framework for faces that jointly estimates facial pose, shape, expression, reflectance and illumination from a single input image. This enables advanced real-time editing of facial imagery, such as appearance editing and relighting.

1 minute read     [Paper]  [Video]  [Bibtex] 

State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications

This report summarizes recent trends in monocular facial performance capture and discusses its applications, which range from performance-based animation to real-time facial reenactment. We focus on methods where the central task is to recover and track a three dimensional model of the human face using optimization-based reconstruction algorithms.

1 minute read     [Paper]  [Bibtex] 

Eurographics 2018: State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications

This state-of-the-art report session summarizes recent trends in monocular facial performance capture and discusses its applications, which range from performance-based animation to real-time facial reenactment. We focus on methods where the central task is to recover and track a three dimensional model of the human face using optimization-based reconstruction algorithms.

1 minute read     [Paper]  [Bibtex] 

FaceVR: Real-Time Facial Reenactment and Eye Gaze Control in Virtual Reality

We propose FaceVR, a novel image-based method that enables video teleconferencing in VR based on self-reenactment. The key component of FaceVR is a robust algorithm to perform real-time facial motion capture of an actor who is wearing a head-mounted display (HMD).

1 minute read     [Paper]  [Video]  [Bibtex] 

Dissertation: Face2Face - Facial Reenactment

This dissertation summarizes the work in the field of markerless motion tracking, face reconstruction and its applications. Especially, it shows real-time facial reenactment that enables the transfer of facial expressions from one video to another video.

2 minute read     [Paper]  [Bibtex] 

SIGGRAPH Emerging Technologies: Demo of FaceVR

We present a novel method for the interactive markerless reconstruction of human heads using a single commodity RGB‐D sensor. Our entire reconstruction pipeline is implemented on the graphics processing unit and allows to obtain high‐quality reconstructions of the human head using an interactive and intuitive reconstruction paradigm.

1 minute read     [Paper]  [Video]  [Bibtex] 

SIGGRAPH Emerging Technologies: Real-time Face Capture and Reenactment of RGB Videos

We show a demo for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). Our goal is to animate the facial expressions of a target video by a source actor and re-render the manipulated output video in a photo-realistic fashion.

1 minute read     [Paper]  [Video]  [Bibtex] 

Face2Face: Real-time Face Capture and Reenactment of RGB Videos

We present a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion.

1 minute read     [Paper]  [Video]  [Bibtex] 

Real-Time Pixel Luminance Optimization for Dynamic Multi-Projection Mapping

Using projection mapping enables us to bring virtual worlds into shared physical spaces. In this paper, we present a novel, adaptable and real-time projection mapping system, which supports multiple projectors and high quality rendering of dynamic content on surfaces of complex geometrical shape. Our system allows for smooth blending across multiple projectors using a new optimization framework that simulates the diffuse direct light transport of the physical world to continuously adapt the color output of each projector pixel.

1 minute read     [Paper]  [Video]  [Bibtex] 

Interactive Model-based Reconstruction of the Human Head using an RGB-D Sensor

We present a novel method for the interactive markerless reconstruction of human heads using a single commodity RGB‐D sensor. Our entire reconstruction pipeline is implemented on the graphics processing unit and allows to obtain high‐quality reconstructions of the human head using an interactive and intuitive reconstruction paradigm.

1 minute read     [Paper]  [Video]  [Bibtex]