Neural Capture & Synthesis
Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions
Joker uses one reference image to generate a 3D reconstruction with a novel extreme expression. The target expression is defined through 3DMM parameters and text prompts. The text prompts effectively resolve ambiguities in the 3DMM input and can control emotion-related expression subtleties and tongue articulation.
2024, Nov 25 — [Paper] [Video] [Bibtex]3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing
We present 3DiFACE, a novel audio-conditioned diffusion model for holistic speech-driven 3D facial animation, which produces diverse plausible lip and head motions for a single audio input, while also allowing editing via keyframing and interpolation.
2024, Nov 02 — [Paper] [Video] [Bibtex]D3GA: Drivable 3D Gaussian Avatars
We present Drivable 3D Gaussian Avatars (D3GA), the first 3D controllable model for human bodies rendered with Gaussian splats.
2024, Nov 01 — [Paper] [Video] [Bibtex]Gaussian-Haircut: Human Hair Reconstruction with Strand-Aligned 3D Gaussians
We introduce a new hair modeling method that uses a dual representation of classical hair strands and 3D Gaussians to produce accurate and realistic strand-based reconstructions from multi-view data.
2024, Oct 01 — [Paper] [Video] [Bibtex]Stable Video Portraits
Stable Video Portraits is a novel hybrid 2D/3D generation method that outputs photorealistic videos of talking faces leveraging a large pre-trained text-to-image prior (2D), controlled via a 3DMM (3D). It is based on a personalized image diffusion prior which allows us to generate new videos of the subject, and also to edit the appearance by blending the personalized image prior with a general text-conditioned model.
2024, Oct 01 — [Paper] [Bibtex]TeSMo: Generating Human Interaction Motions in Scenes with Text Control
TeSMo is a method for text-controlled scene-aware motion generation based on denoising diffusion models. Specifically, we pre-train a scene-agnostic text-to-motion diffusion model, emphasizing goal-reaching constraints on large-scale motion-capture datasets. Then, we enhance this model with a scene-aware component, fine-tuned using data augmented with detailed scene information, including ground plane and object shapes.
2024, Sep 29 — [Paper] [Video] [Bibtex]Environment-Specific People
We present ESP, a novel method for context-aware full-body generation, that enables photo-realistic inpainting of people into existing "in-the-wild" photographs.
2024, Sep 01 — [Paper] [Bibtex]HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles
We present HAAR, a new strand-based generative model for 3D human hairstyles. Specifically, based on textual inputs, HAAR produces 3D hairstyles that could be used as production-level assets in modern computer graphics engines.
2024, Feb 01 — [Paper] [Video] [Bibtex]TECA: Text-Guided Generation and Editing of Compositional 3D Avatars
Given a text description, our method produces a compositional 3D avatar consisting of a mesh-based face and body and NeRF-based hair, clothing and other accessories.
2024, Jan 13 — [Paper] [Video] [Bibtex]TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
TeCH reconstructs the 3D human by leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles) which are automatically generated via a garment parsing model and Visual Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion model (T2I) which learns the "indescribable" appearance.
2024, Jan 01 — [Paper] [Video] [Bibtex]FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from input audio signal. To capture the expressive, detailed nature of human heads, including hair, ears, and finer-scale eye movements, we propose to couple speech signal with the latent space of neural parametric head models to create high-fidelity, temporally coherent motion sequences
2024, Jan 01 — [Paper] [Video] [Bibtex]DPHMs: Diffusion Parametric Head Models for Depth-based Tracking
We introduce Diffusion Parametric Head Models (DPHMs), a generative model that enables robust volumetric head reconstruction and tracking from monocular depth sequences.
2024, Jan 01 — [Paper] [Video] [Bibtex]360° Volumetric Portrait Avatar
We propose 360° Volumetric Portrait (3VP) Avatar, a novel method for reconstructing 360° photo-realistic portrait avatars of human subjects solely based on monocular video inputs.
2024, Jan 01 — [Paper] [Video] [Bibtex]SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes
We present SCULPT, a novel 3D generative model for clothed and textured 3D meshes of humans. Specifically, we devise a deep neural network that learns to represent the geometry and appearance distribution of clothed human bodies.
2024, Jan 01 — [Paper] [Bibtex]TADA: Text to Animatable Digital Avatars
We introduce TADA, a simple-yet-effective approach that takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures, that can be animated and rendered with traditional graphics pipelines.
2024, Jan 01 — [Paper] [Video] [Bibtex]GAN-Avatar: Controllable Personalized GAN-based Human Head Avatars
We propose to learn person-specific animatable avatars from images without assuming to have access to precise facial expression tracking. At the core of our method, we leverage a 3D-aware generative model that is trained to reproduce the distribution of facial expressions from the training data.
2023, Dec 31 — [Paper] [Video] [Bibtex]DiffuScene: Scene Graph Denoising Diffusion Probabilistic Model for Generative Indoor Scene Synthesis
We present DiffuScene, a diffusion-based method for indoor 3D scene synthesis which operates on a 3D scene graph representation.
2023, Dec 31 — [Paper] [Video] [Bibtex]Imitator: Personalized Speech-driven 3D Facial Animation
We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video and produces novel facial expressions matching the identity-specific speaking style and facial idiosyncrasies of the target actor. Specifically, we train a style-agnostic transformer on a large facial expression dataset which we use as a prior for audio-driven facial expressions. Based on this prior, we optimize for identity-specific speaking style based on a short reference video.
2023, Aug 14 — [Paper] [Video] [Bibtex]CaPhy: Capturing Physical Properties for Animatable Human Avatars
We present CaPhy, a novel method for reconstructing animatable human avatars with realistic dynamic properties for clothing. Specifically, we aim for capturing the geometric and physical properties of the clothing from real observations. This allows us to apply novel poses to the human avatar with physically correct deformations and wrinkles of the clothing.
2023, Aug 14 — [Paper] [Video] [Bibtex]ClipFace: Text-guided Editing of Textured 3D Morphable Models
ClipFace is a novel self-supervised approach for text-guided editing of textured 3D morphable model of faces. Controllable editing and manipulation are given by language prompts to adapt texture and expression of the 3D morphable model.
2023, Mar 30 — [Paper] [Video] [Bibtex]High-Res Facial Appearance Capture from Polarized Smartphone Images
We propose a novel method for high-quality facial texture reconstruction from RGB images using a novel capturing routine based on a single smartphone which we equip with an inexpensive polarization foil.
2023, Mar 30 — [Paper] [Video] [Bibtex]MIME: Human-Aware 3D Scene Generation
Humans constantly interact with their environment. They walk through a room, touch objects, rest on a chair, or sleep in a bed. All these interactions contain information about the scene layout and object placement which we leverage to generate scenes from human motion.
2023, Mar 29 — [Paper] [Video] [Bibtex]INSTA: Instant Volumetric Head Avatars
Instead of prerecorded, old avatars, we aim to instantaneously reconstruct the subject's look to capture the actual appearance during a meeting. To this end, we propose INSTA, which enables the reconstruction of an avatar within a few minutes (~10 min) and can be driven at interactive frame rates.
2023, Mar 28 — [Paper] [Video] [Bibtex]DINER: Depth-aware Image-based NEural Radiance Fields
Given a sparse set of RGB input views, we predict depth and feature maps to guide the reconstruction of a volumetric scene representation that allows us to render 3D objects under novel views.
2023, Mar 28 — [Paper] [Video] [Bibtex]Neural Deformation Priors
We present Neural Shape Deformation Priors, a novel method for shape manipulation that predicts mesh deformations of non-rigid objects from user-provided handle movements.
2022, Dec 01 — [Paper] [Video] [Bibtex]MICA: Towards Metrical Reconstruction of Human Faces
Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context. Thus, we present MICA, a novel metrical face reconstruction method that combines face recognition with supervised face shape learning.
2022, Jul 04 — [Paper] [Video] [Bibtex]Texturify: Generating Textures on 3D Shape Surfaces
Texturify learns to generate geometry-aware textures for untextured collections of 3D objects. Our method trains from only a collection of images and a collection of untextured shapes, which are both often available, without requiring any explicit 3D color supervision or shape-image correspondence. Textures are created directly on the surface of a given 3D shape, enabling generation of high-quality, compelling textured 3D shapes.
2022, Jul 04 — [Paper] [Video] [Bibtex]Neural Head Avatars from Monocular RGB Videos
We present Neural Head Avatars, a novel neural representation that explicitly models the surface geometry and appearance of an animatable human avatar using a deep neural network. Specifically, we propose a hybrid representation consisting of a morphable model for the coarse shape and expressions of the face, and two feed-forward networks, predicting vertex offsets of the underlying mesh as well as a view- and expression-dependent texture.
2022, Mar 22 — [Paper] [Video] [Bibtex]Neural RGB-D Surface Reconstruction
We demonstrate how depth measurements can be incorporated into the neural radiance field formulation to produce more detailed and complete reconstruction results than using methods based on either color or depth data alone.
2022, Mar 22 — [Paper] [Video] [Bibtex]Mover: Human-Aware Object Placement for Visual Environment Reconstruction
We demonstrate that human-scene interactions (HSIs) can be leveraged to improve the 3D reconstruction of a scene from a monocular RGB video. Our key idea is that, as a person moves through a scene and interacts with it, we accumulate HSIs across multiple input images, and optimize the 3D scene to reconstruct a consistent, physically plausible and functional 3D scene layout.
2022, Mar 22 — [Paper] [Video] [Bibtex]Advances in Neural Rendering
This state-of-the-art report on advances in neural rendering focuses on methods that combine classical rendering principles with learned 3D scene representations, often now referred to as neural scene representations. A key advantage of these methods is that they are 3D-consistent by design, enabling applications such as novel viewpoint synthesis of a captured scene.
2022, Jan 01 — [Paper] [Bibtex]3DV 2021: Tutorial on the Advances in Neural Rendering
In this tutorial, we will talk about the advances in neural rendering, especially the underlying 2D and 3D representations that allow for novel viewpoint synthesis, controllability and editability. Specifically, we will discuss neural rendering methods based on 2D GANs, techniques using 3D Neural Radiance Fields or learnable sphere proxies. Besides methods that handle static content, we will talk about dynamic content as well.
2021, Nov 29 — [Video]SIGGRAPH 2021: Course on the Advances in Neural Rendering
This course covers the advances in neural rendering over the years 2020-2021.
2021, Aug 08 — [Video] [Bibtex]TransformerFusion: Monocular RGB Scene Reconstruction using Transformers
We introduce TransformerFusion, a transformer-based 3D scene reconstruction approach. From an input monocular RGB video, the video frames are processed by a transformer network that fuses the observations into a volumetric feature grid representing the scene; this feature grid is then decoded into an implicit 3D scene representation.
2021, Jul 12 — [Paper] [Video] [Bibtex]Dynamic Surface Function Networks for Clothed Human Bodies
We present a novel method for temporal coherent reconstruction and tracking of clothed humans using dynamic surface function networks which can be trained with a monocular RGB-D sequence.
2021, Apr 12 — [Paper] [Video] [Bibtex]Neural Parametric Models for 3D Deformable Shapes
We propose Neural Parametric Models (NPMs), a novel, learned alternative to traditional, parametric 3D models, which does not require hand-crafted, object-specific constraints.
2021, Apr 12 — [Paper] [Video] [Bibtex]RetrievalFuse: Neural 3D Scene Reconstruction with a Database
In this paper, we introduce a new method that directly leverages scene geometry from the training database. It is able to reconstruct a high quality scene from pointcloud or low-res inputs using geometry patches from a database and an attention-based refinement.
2021, Apr 12 — [Paper] [Video] [Bibtex]NerFACE: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction
We present dynamic neural radiance fields for modeling the appearance and dynamics of a human face. To handle the dynamics of the face, we combine our scene representation network with a low-dimensional morphable model which provides explicit control over pose and expressions. We use volumetric rendering to generate images from this hybrid representation and demonstrate that such a dynamic neural scene representation can be learned from monocular input data only, without the need of a specialized capture setup.
2021, Mar 03 — [Paper] [Video] [Bibtex]Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction
We introduce Neural Deformation Graphs for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects. Specifically, we implicitly model a deformation graph via a deep neural network. This neural deformation graph does not rely on any object-specific structure and, thus, can be applied to general non-rigid deformation tracking.
2021, Mar 03 — [Paper] [Video] [Bibtex]SPSG: Self-Supervised Photometric Scene Generation from RGB-D Scans
We present a novel approach to generate high-quality, colored 3D models of scenes from RGB-D scan observations by learning to infer unobserved scene geometry and color in a self-supervised fashion.
2021, Mar 02 — [Paper] [Video] [Bibtex]Neural Non-Rigid Tracking
We introduce a novel, end-to-end learnable, differentiable non-rigid tracker that enables state-of-the-art non-rigid reconstruction. By enabling gradient back-propagation through a non-rigid as-rigid-as-possible optimization solver, we are able to learn correspondences in an end-to-end manner such that they are optimal for the task of non-rigid tracking.
2020, Sep 29 — [Paper] [Video] [Bibtex]Egocentric Videoconferencing
We introduce a method for egocentric videoconferencing that enables hands-free video calls, for instance by people wearing smart glasses or other mixed-reality devices.
2020, Sep 28 — [Paper] [Video] [Bibtex]Learning Adaptive Sampling and Reconstruction for Volume Visualization
We introduce a novel neural rendering pipeline, which is trained end-to-end to generate a sparse adaptive sampling structure from a given low-resolution input image, and reconstructs a high-resolution image from the sparse set of samples.
2020, Jul 22 — [Paper] [Bibtex]Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image Decomposition
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
2020, Jun 23 — [Paper] [Bibtex]CVPR 2020: Tutorial on Neural Rendering
Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. This state-of-the-art report summarizes the recent trends and applications of neural rendering.
2020, Apr 08 — [Paper] [Video] [Bibtex]State of the Art on Neural Rendering
Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. This state-of-the-art report summarizes the recent trends and applications of neural rendering.
2020, Apr 08 — [Paper] [Bibtex]Adversarial Texture Optimization from RGB-D Scans
We present a novel approach for color texture generation using a conditional adversarial loss obtained from weakly-supervised views. Specifically, we propose an approach to produce photorealistic textures for approximate surfaces, even from misaligned images, by learning an objective function that is robust to these errors.
2020, Mar 19 — [Paper] [Video] [Bibtex]Image-guided Neural Object Rendering
We propose a new learning-based novel view synthesis approach for scanned objects that is trained based on a set of multi-view images, where we directly train a deep neural network to synthesize a view-dependent image of an object.
2020, Jan 15 — [Paper] [Video] [Bibtex]Neural Voice Puppetry:
Audio-driven Facial Reenactment
Given an audio sequence of a source person or digital assistant, we generate a photo-realistic output video of a target person that is in sync with the audio of the source input.
2020, Jan 08 — [Paper] [Video] [Bibtex]Deferred Neural Rendering:
Image Synthesis using Neural Textures
Deferred Neural Rendering is a new paradigm for image synthesis that combines the traditional graphics pipeline with learnable Neural Textures. Both neural textures and deferred neural renderer are trained end-to-end, enabling us to synthesize photo-realistic images even when the original 3D content was imperfect.
2019, Apr 28 — [Paper] [Video] [Bibtex]DeepVoxels: Learning Persistent 3D Feature Embeddings
In this work, we address the lack of 3D understanding of generative neural networks by introducing a persistent 3D feature embedding for view synthesis. To this end, we propose DeepVoxels, a learned representation that encodes the view-dependent appearance of a 3D object without having to explicitly model its geometry.
2019, Apr 11 — [Paper] [Video] [Bibtex]Research Highlight: Face2Face
Research highlight of the Face2Face approach featured on the cover of Communications of the ACM in January 2019. Face2Face is an approach for real-time facial reenactment of a monocular target video. The method had significant impact in the research community and far beyond; it won several wards, e.g., Siggraph ETech Best in Show Award, it was featured in countless media articles, e.g., NYT, WSJ, Spiegel, etc., and it had a massive reach on social media with millions of views.
2019, Jan 01 — [Paper] [Video] [Bibtex]ECCV 2018: Tutorial on Face Tracking and its Applications
This invited tutorial is about monocular face tracking techniques and also discusses the possible applications. It is based on our Eurographics state-of-the-art report.
2018, Sep 08 — [Paper] [Bibtex]Deep Video Portraits
Our novel approach enables photo-realistic re-animation of portrait videos using only an input video. The core of our approach is a generative neural network with a novel space-time architecture. The network takes as input synthetic renderings of a parametric face model, based on which it predicts photo-realistic video frames for a given target actor.
2018, May 29 — [Paper] [Video] [Bibtex]HeadOn: Real-time Reenactment of Human Portrait Videos
HeadOn is the first real-time reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel reenactment algorithm employs this proxy to map the captured motion from the source to the target actor.
2018, May 29 — [Paper] [Video] [Bibtex]InverseFaceNet: Deep Monocular Inverse Face Rendering
We introduce InverseFaceNet, a deep convolutional inverse rendering framework for faces that jointly estimates facial pose, shape, expression, reflectance and illumination from a single input image. This enables advanced real-time editing of facial imagery, such as appearance editing and relighting.
2018, May 16 — [Paper] [Video] [Bibtex]State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications
This report summarizes recent trends in monocular facial performance capture and discusses its applications, which range from performance-based animation to real-time facial reenactment. We focus on methods where the central task is to recover and track a three dimensional model of the human face using optimization-based reconstruction algorithms.
2018, Apr 24 — [Paper] [Bibtex]Eurographics 2018: State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications
This state-of-the-art report session summarizes recent trends in monocular facial performance capture and discusses its applications, which range from performance-based animation to real-time facial reenactment. We focus on methods where the central task is to recover and track a three dimensional model of the human face using optimization-based reconstruction algorithms.
2018, Apr 24 — [Paper] [Bibtex]FaceVR: Real-Time Facial Reenactment and Eye Gaze Control in Virtual Reality
We propose FaceVR, a novel image-based method that enables video teleconferencing in VR based on self-reenactment. The key component of FaceVR is a robust algorithm to perform real-time facial motion capture of an actor who is wearing a head-mounted display (HMD).
2018, Mar 21 — [Paper] [Video] [Bibtex]Dissertation: Face2Face - Facial Reenactment
This dissertation summarizes the work in the field of markerless motion tracking, face reconstruction and its applications. Especially, it shows real-time facial reenactment that enables the transfer of facial expressions from one video to another video.
2017, Oct 16 — [Paper] [Bibtex]FaceForge: Markerless Non-Rigid Face Multi-Projection Mapping
In this paper, we introduce FaceForge, a multi-projection mapping system that is able to alter the appearance of a non-rigidly moving human face in real time.
2017, Oct 10 — [Paper] [Video] [Bibtex]SIGGRAPH Emerging Technologies: Demo of FaceVR
We present a novel method for the interactive markerless reconstruction of human heads using a single commodity RGB‐D sensor. Our entire reconstruction pipeline is implemented on the graphics processing unit and allows to obtain high‐quality reconstructions of the human head using an interactive and intuitive reconstruction paradigm.
2017, Aug 03 — [Paper] [Video] [Bibtex]FaceInCar Demo at the National IT Summit 2016
We demonstrate the capabilities of the dense face fitting proposed in Face2Face in the challenging scenario of face tracking in a car including occlusions and strong varying light situations.
2016, Nov 17 —SIGGRAPH Emerging Technologies: Real-time Face Capture and Reenactment of RGB Videos
We show a demo for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). Our goal is to animate the facial expressions of a target video by a source actor and re-render the manipulated output video in a photo-realistic fashion.
2016, Jul 28 — [Paper] [Video] [Bibtex]GPU Technology Conference: Interactive Demo of Face2Face
Nvidia invited us to show a demo for our real-time facial reenactment system (Face2Face). Our goal is to animate the facial expressions of a target video by a source actor and re-render the manipulated output video in a photo-realistic fashion.
2016, Apr 07 — [Paper] [Video] [Bibtex]Face2Face: Real-time Face Capture and Reenactment of RGB Videos
We present a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion.
2016, Mar 23 — [Paper] [Video] [Bibtex]Real-Time Pixel Luminance Optimization for Dynamic Multi-Projection Mapping
Using projection mapping enables us to bring virtual worlds into shared physical spaces. In this paper, we present a novel, adaptable and real-time projection mapping system, which supports multiple projectors and high quality rendering of dynamic content on surfaces of complex geometrical shape. Our system allows for smooth blending across multiple projectors using a new optimization framework that simulates the diffuse direct light transport of the physical world to continuously adapt the color output of each projector pixel.
2015, Sep 14 — [Paper] [Video] [Bibtex]Real-time Expression Transfer for Facial Reenactment
We present a method for the real-time transfer of facial expressions from an actor in a source video to an actor in a target video, thus enabling the ad-hoc control of the facial expressions of the target actor.
2015, Aug 27 — [Paper] [Video] [Bibtex]Interactive Model-based Reconstruction of the Human Head using an RGB-D Sensor
We present a novel method for the interactive markerless reconstruction of human heads using a single commodity RGB‐D sensor. Our entire reconstruction pipeline is implemented on the graphics processing unit and allows to obtain high‐quality reconstructions of the human head using an interactive and intuitive reconstruction paradigm.
2014, Apr 28 — [Paper] [Video] [Bibtex]