Neural Capture & Synthesis

My work includes the photo-realistic video synthesis and editing which has a variety of useful applications (e.g., AR/VR telepresence, movie post-production, medical applications, virtual mirrors, virtual sightseeing). The development of algorithms for photo-realistic creation or editing of image content comes with a certain responsibility, since the generation of photo-realistic imagery can be misused. With the rise of methods that are able to synthesize photo-realistic content, we alread see that the society is confronted with fake imagery that is used for malicious purposes (fake news, cyber mobbing). Thus, the automatic detection of synthetic or manipulated content is of paramount importance (e.g., with a browser plugin that automatically flags manipulated images and videos). Gaining knowledge about the creation process will help to design forgery detection algorithms and vice versa. Given the findings of the current literature, an omnipotent detection algorithm that is able to detect a variety of manipulations is still a challenging problem, since most methods do not generalize to unseen manipulation methods. (Online) Self-supervised and few-shot learning methods show promising results. Especially, learning from real examples how a specific person behaves, looks and talks could lead to detection methods that do not overfit to a specific manipulation method. Note that findings in the forensics community can be used to improve the synthesis, since it provides a measurement whether a manipulation is good (deceiving the detector) or bad (detected as manipulation). In contrast to 'passive' forensic methods that only get access to the image or video, one can also actively add cryptographic signatures or watermarks to images and videos. Digital signatures ideally ensure that the media is created by a specific person and that it is not modified afterwards by a second person. While such digital signatures are standard elements for websites, encrypted emails, etc., it is rarely used for images and videos which makes the research for passive detection methods so important.


MICA: Towards Metrical Reconstruction of Human Faces

Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context. Thus, we present MICA, a novel metrical face reconstruction method that combines face recognition with supervised face shape learning.

2 minute read     [Paper]  [Video]  [Bibtex] 

Texturify: Generating Textures on 3D Shape Surfaces

Texturify learns to generate geometry-aware textures for untextured collections of 3D objects. Our method trains from only a collection of images and a collection of untextured shapes, which are both often available, without requiring any explicit 3D color supervision or shape-image correspondence. Textures are created directly on the surface of a given 3D shape, enabling generation of high-quality, compelling textured 3D shapes.

1 minute read     [Paper]  [Video]  [Bibtex] 

Neural Head Avatars from Monocular RGB Videos

We present Neural Head Avatars, a novel neural representation that explicitly models the surface geometry and appearance of an animatable human avatar using a deep neural network. Specifically, we propose a hybrid representation consisting of a morphable model for the coarse shape and expressions of the face, and two feed-forward networks, predicting vertex offsets of the underlying mesh as well as a view- and expression-dependent texture.

1 minute read     [Paper]  [Video]  [Bibtex] 

Mover: Human-Aware Object Placement for Visual Environment Reconstruction

We demonstrate that human-scene interactions (HSIs) can be leveraged to improve the 3D reconstruction of a scene from a monocular RGB video. Our key idea is that, as a person moves through a scene and interacts with it, we accumulate HSIs across multiple input images, and optimize the 3D scene to reconstruct a consistent, physically plausible and functional 3D scene layout.

1 minute read     [Paper]  [Video]  [Bibtex] 

Advances in Neural Rendering

This state-of-the-art report on advances in neural rendering focuses on methods that combine classical rendering principles with learned 3D scene representations, often now referred to as neural scene representations. A key advantage of these methods is that they are 3D-consistent by design, enabling applications such as novel viewpoint synthesis of a captured scene.

2 minute read     [Paper]  [Bibtex] 

3DV 2021: Tutorial on the Advances in Neural Rendering

In this tutorial, we will talk about the advances in neural rendering, especially the underlying 2D and 3D representations that allow for novel viewpoint synthesis, controllability and editability. Specifically, we will discuss neural rendering methods based on 2D GANs, techniques using 3D Neural Radiance Fields or learnable sphere proxies. Besides methods that handle static content, we will talk about dynamic content as well.

1 minute read     [Video] 

SIGGRAPH 2021: Course on the Advances in Neural Rendering

This course covers the advances in neural rendering over the years 2020-2021.

1 minute read     [Video]  [Bibtex] 

TransformerFusion: Monocular RGB Scene Reconstruction using Transformers

We introduce TransformerFusion, a transformer-based 3D scene reconstruction approach. From an input monocular RGB video, the video frames are processed by a transformer network that fuses the observations into a volumetric feature grid representing the scene; this feature grid is then decoded into an implicit 3D scene representation.

1 minute read     [Paper]  [Video]  [Bibtex] 

NerFACE: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction

We present dynamic neural radiance fields for modeling the appearance and dynamics of a human face. To handle the dynamics of the face, we combine our scene representation network with a low-dimensional morphable model which provides explicit control over pose and expressions. We use volumetric rendering to generate images from this hybrid representation and demonstrate that such a dynamic neural scene representation can be learned from monocular input data only, without the need of a specialized capture setup.

1 minute read     [Paper]  [Video]  [Bibtex] 

Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction

We introduce Neural Deformation Graphs for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects. Specifically, we implicitly model a deformation graph via a deep neural network. This neural deformation graph does not rely on any object-specific structure and, thus, can be applied to general non-rigid deformation tracking.

1 minute read     [Paper]  [Video]  [Bibtex] 

Neural Non-Rigid Tracking

We introduce a novel, end-to-end learnable, differentiable non-rigid tracker that enables state-of-the-art non-rigid reconstruction. By enabling gradient back-propagation through a non-rigid as-rigid-as-possible optimization solver, we are able to learn correspondences in an end-to-end manner such that they are optimal for the task of non-rigid tracking.

1 minute read     [Paper]  [Video]  [Bibtex] 

CVPR 2020: Tutorial on Neural Rendering

Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. This state-of-the-art report summarizes the recent trends and applications of neural rendering.

1 minute read     [Paper]  [Video]  [Bibtex] 

State of the Art on Neural Rendering

Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. This state-of-the-art report summarizes the recent trends and applications of neural rendering.

2 minute read     [Paper]  [Bibtex] 

Adversarial Texture Optimization from RGB-D Scans

We present a novel approach for color texture generation using a conditional adversarial loss obtained from weakly-supervised views. Specifically, we propose an approach to produce photorealistic textures for approximate surfaces, even from misaligned images, by learning an objective function that is robust to these errors.

1 minute read     [Paper]  [Video]  [Bibtex] 

Deferred Neural Rendering:
Image Synthesis using Neural Textures

Deferred Neural Rendering is a new paradigm for image synthesis that combines the traditional graphics pipeline with learnable Neural Textures. Both neural textures and deferred neural renderer are trained end-to-end, enabling us to synthesize photo-realistic images even when the original 3D content was imperfect.

2 minute read     [Paper]  [Video]  [Bibtex] 

DeepVoxels: Learning Persistent 3D Feature Embeddings

In this work, we address the lack of 3D understanding of generative neural networks by introducing a persistent 3D feature embedding for view synthesis. To this end, we propose DeepVoxels, a learned representation that encodes the view-dependent appearance of a 3D object without having to explicitly model its geometry.

1 minute read     [Paper]  [Video]  [Bibtex] 

Research Highlight: Face2Face

Research highlight of the Face2Face approach featured on the cover of Communications of the ACM in January 2019. Face2Face is an approach for real-time facial reenactment of a monocular target video. The method had significant impact in the research community and far beyond; it won several wards, e.g., Siggraph ETech Best in Show Award, it was featured in countless media articles, e.g., NYT, WSJ, Spiegel, etc., and it had a massive reach on social media with millions of views.

1 minute read     [Paper]  [Video]  [Bibtex] 

Deep Video Portraits

Our novel approach enables photo-realistic re-animation of portrait videos using only an input video. The core of our approach is a generative neural network with a novel space-time architecture. The network takes as input synthetic renderings of a parametric face model, based on which it predicts photo-realistic video frames for a given target actor.

1 minute read     [Paper]  [Video]  [Bibtex] 

HeadOn: Real-time Reenactment of Human Portrait Videos

HeadOn is the first real-time reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel reenactment algorithm employs this proxy to map the captured motion from the source to the target actor.

1 minute read     [Paper]  [Video]  [Bibtex] 

InverseFaceNet: Deep Monocular Inverse Face Rendering

We introduce InverseFaceNet, a deep convolutional inverse rendering framework for faces that jointly estimates facial pose, shape, expression, reflectance and illumination from a single input image. This enables advanced real-time editing of facial imagery, such as appearance editing and relighting.

1 minute read     [Paper]  [Video]  [Bibtex] 

Eurographics 2018: State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications

This state-of-the-art report session summarizes recent trends in monocular facial performance capture and discusses its applications, which range from performance-based animation to real-time facial reenactment. We focus on methods where the central task is to recover and track a three dimensional model of the human face using optimization-based reconstruction algorithms.

1 minute read     [Paper]  [Bibtex] 

State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications

This report summarizes recent trends in monocular facial performance capture and discusses its applications, which range from performance-based animation to real-time facial reenactment. We focus on methods where the central task is to recover and track a three dimensional model of the human face using optimization-based reconstruction algorithms.

1 minute read     [Paper]  [Bibtex] 

FaceVR: Real-Time Facial Reenactment and Eye Gaze Control in Virtual Reality

We propose FaceVR, a novel image-based method that enables video teleconferencing in VR based on self-reenactment. The key component of FaceVR is a robust algorithm to perform real-time facial motion capture of an actor who is wearing a head-mounted display (HMD).

1 minute read     [Paper]  [Video]  [Bibtex] 

Dissertation: Face2Face - Facial Reenactment

This dissertation summarizes the work in the field of markerless motion tracking, face reconstruction and its applications. Especially, it shows real-time facial reenactment that enables the transfer of facial expressions from one video to another video.

2 minute read     [Paper]  [Bibtex] 

SIGGRAPH Emerging Technologies: Demo of FaceVR

We present a novel method for the interactive markerless reconstruction of human heads using a single commodity RGB‐D sensor. Our entire reconstruction pipeline is implemented on the graphics processing unit and allows to obtain high‐quality reconstructions of the human head using an interactive and intuitive reconstruction paradigm.

1 minute read     [Paper]  [Video]  [Bibtex] 

SIGGRAPH Emerging Technologies: Real-time Face Capture and Reenactment of RGB Videos

We show a demo for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). Our goal is to animate the facial expressions of a target video by a source actor and re-render the manipulated output video in a photo-realistic fashion.

1 minute read     [Paper]  [Video]  [Bibtex] 

Face2Face: Real-time Face Capture and Reenactment of RGB Videos

We present a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion.

1 minute read     [Paper]  [Video]  [Bibtex] 

Real-Time Pixel Luminance Optimization for Dynamic Multi-Projection Mapping

Using projection mapping enables us to bring virtual worlds into shared physical spaces. In this paper, we present a novel, adaptable and real-time projection mapping system, which supports multiple projectors and high quality rendering of dynamic content on surfaces of complex geometrical shape. Our system allows for smooth blending across multiple projectors using a new optimization framework that simulates the diffuse direct light transport of the physical world to continuously adapt the color output of each projector pixel.

1 minute read     [Paper]  [Video]  [Bibtex] 

Interactive Model-based Reconstruction of the Human Head using an RGB-D Sensor

We present a novel method for the interactive markerless reconstruction of human heads using a single commodity RGB‐D sensor. Our entire reconstruction pipeline is implemented on the graphics processing unit and allows to obtain high‐quality reconstructions of the human head using an interactive and intuitive reconstruction paradigm.

1 minute read     [Paper]  [Video]  [Bibtex]