2. ● What is 3D Computer Vision?
● What is Differentiable Rendering (DR)?
● How Differentiable Rendering Works?
● Application “Monocular Differentiable Rendering for Self-Supervised Object Detection”
● Other examples
● Open Discussion
Agenda
3. Education:
• Isik University, Istanbul, BSc. Electronics Engineering, 2011
• Yeditepe University, Istanbul, MSc. Electronics Engineering, 2014
Experience:
• Huawei R&D Center, Istanbul, SW Engineer, 2012-2014
• Ayonix Inc, Tokyo, SW Engineer, 2014-2015
• TeraRecon Inc, Tokyo, SW Engineer, 2015-2017
• Preferred Networks, Tokyo, Engineer, 2018 -
• Preferred Networks, Tokyo, Engineer / Engineering Manager, 2019-2021
Freelance:
• Managed & Consulted several projects based on 3D Computer Vision and Cloud Gaming
Introduction - Deniz Beker
4. What is 3D Computer Vision?
Sequential
Data
Text
1-D Data
Image
2-D Data
5. What is 3D Computer Vision?
Multi-View 3D Object Detection Network For Autonomous Driving,
Chen et. al., 2017
3D Perception 3D Object Detection
Shape Segmentation & Classification
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, Qi et al. 2017 Convolutional Occupancy Networks, Peng et al, 2020
Shape Completion
&
Denoising
And many other applications...
6. ● What is 3D Computer Vision?
● What is Differentiable Rendering (DR)?
● How Differentiable Rendering Works?
● Application “Monocular Differentiable Rendering for Self-Supervised Object Detection”
● Other examples
● Open Discussion
Agenda
7. Supervised Deep Learning Pipeline
1
0
Input Data
Forward Pass
Calculate Error
Labels /
Supervision
● Penalize each neuron w.r.t. their
contribution to error
● Reiterate
8. Supervision Challenges in 3D Computer Vision
● It is not easy to label 3D data accurately
(where to click, how to label, it is an open HCI problem)
● Sometimes, it is impossible to generate 3D labels
(i.e. light directions in an image in the wild)
● It is costly to collect 3D data
(Most likely need to equip LIDAR-like equipment for precise data collection)
9. Self-Supervised Solution: Differentiable Rendering
● Projective Geometry Aware
Dimensionality Reduction (3D -> 2D)
● Allowing usage of 2D input images as
self-supervision to predict 3D parameters
● Estimating the 3D world parameters only
from 2D observations
10. ● What is 3D Computer Vision?
● What is Differentiable Rendering (DR)?
● How Differentiable Rendering Works?
● Application “Monocular Differentiable Rendering for Self-Supervised Object Detection”
● Other examples
● Open Discussion
Agenda
11. 3D Shape Representations Types
Point Cloud Mesh Voxel Implicit Function
RGB Rendering
(For Reference)
Source: Semantic and instance segmentation based on 3D point cloud scenes: RandLA-Net and 3D-BoNet
Explicit Representations Implicit Representation
12. Rendering: Rasterization in OpenGL pipeline
The aim is to project 3D geometry & scene
information to a 2D plane
● Project each triangle to screen space
● Order each triangle based on depth and
determine which one is visible
● Mark each pixels that lie inside each visible
triangle
● Apply texture to those pixels for color
● Post process (optional)
○ Light
○ Reflection & Refraction
○ Shadow
Source:
https://developer.tizen.org/ko/forums/native-application-development/what-rasterisation-opengl-graphic-pipeline
13. Rendering: Ray Tracing
The aim is to bind each hypothetical line
segments from a pixel to a light source
● For each pixel, shoot a ray and check if it hits
to a light or a surface
● If it hits to a surface, calculate possible
refractions & reflections to generate N more
rays
● Follow those rays and repeat the process
until a ray hits to a light
● If any ray hits to a light, calculate the color
by combining light intensity, light color,
material refraction & reflection parameters
● Combine the impact of multiple rays if they
affect same pixel
Source: https://en.wikipedia.org/wiki/Ray_tracing_(graphics)
14. Analytical Derivative of Mesh Rasterization
Mesh Vertices Vp
Vertices @ Screen Space
Order & Choose Visible Triangles
Assign Color To A Pixel Through Texture
Post Process For Light etc.
Not Differentiable Due Discrete Nature
Differentiable By Default
Differentiable By Default
Differentiable By Default
Pixel Color Vc
Rasterize (Digitize)
Differentiable By Default
15. Differentiating Mesh Rasterization
There are two main problems to solve to differentiate a rendering:
● How to create gradients / backpropagate for object (foreground) pixels?
● How to create gradients / backpropagate for non-object (background pixels)?
There are two mainstream approaches to handle this problem:
● Approximating Gradients (Backward Pass)
● Approximating Rasterization (Forward Pass)
16. Approximating Gradients
Opendr: An approximate differentiable renderer, M. M. Loper and M. J. Black. ECCV, 2014.
Neural 3d mesh renderer, H. Kato, Y. Ushiku, and T. Harada. CVPR, 2018.
● Find the visible triangle per pixel pi
● Find the pi’s projection Pi on 3D
triangle
● Using inverse barycentric coordinate
calculation, find the weights of
triangle’s vertices Vx at point Pi
● Assign the weighted value of gradient
of Pi to each Vx
● Propagate gradients for background
pixels in screen space from
neighbouring pixels (OpenDR) or same
column / row (NMR)
17. Approximating Gradients
Opendr: An approximate differentiable renderer, M. M. Loper and M. J. Black. ECCV, 2014.
Neural 3d mesh renderer, H. Kato, Y. Ushiku, and T. Harada. CVPR, 2018.
Advantages:
● Provides “useful” gradients
● Any professional rasterizer can be
used for forward pass
Disadvantages:
● Handcrafting gradients are hard,
complex and time consuming
● Point pi will not propagate any
gradients
● Fails if accurate gradients are required
18. Approximating Mesh Rasterization
Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning
S. Liu, T. Li, W. Chen, H. Li
Advantages:
● Automatic & Accurate gradient calculation
● Each pixel is covered for gradient
Disadvantages:
● Blurry output
● Computationally expensive
19. Differentiating point cloud rendering
There is no rasterization => Theoretically all pipeline is differentiable
However, there is still a problem due discrete nature of the images
What will happen if multiple points are projected to the same pixel?
What will be the final color? How to backpropagate?
Learning Efficient Point Cloud Generation for Dense
3D Object Reconstruction
Lin et al.
20. Differentiating point cloud rendering
Learning Efficient Point Cloud Generation for Dense
3D Object Reconstruction
Lin et al.
- Increase spatial dimensions to ensure unique projection
- Apply max / mean pooling operation
- Add a pseudo-size to each point
- Apply weighted / probabilistic aggregation for pixel color
End-to-End Learning Local Multi-view Descriptors for 3D Point Clouds
Li et al.
21. Differentiating voxel rendering
Ray marching is differentiable by nature!
Differentiable Ray Marching can be
implemented as a cumulative sum of voxel
occupancy values across a ray
However, it has O(n^3) memory complexity
Neural Volumes: Learning Dynamic Renderable Volumes From Images, Lombardi et al.
22. Differentiating Ray Tracing
● Functions are differentiable by nature
● Discontinuity exists at object edges and
due occlusions
● Approximation / Sampling strategies to
overcome this problem
23. Neural Rendering
● Rendering as NN layers
● Requires learning
● Easier to implement vs previous ones
● Geometry consistency is not guaranteed
Deferred Neural Rendering: Image Synthesis using Neural Textures, Theis et al., 2019
24. Shapes as Implicit Functions
x
y
Consider the aim is to represent this circle in 2D
Mesh / Point Cloud: Sample N points on circle
(N x 2 parameters to represent)
Voxel: Cubify / Discretize to M bins
(M x M parameters to represent)
Implicit Fn: Define the function & Evaluate for (N) or (M x M) points
(3 parameters to represent, a, b, c)
f(x1
, y1
)
f(x2
, y2
)
f(x3
, y3
)
.
.
.
f(xN
, yN
)
f(x1
, y1
)
f(x2
, y2
)
f(x3
, y3
)
..
.
f(xM
, yM
)
25. Neural Implicit Scene Representations
Let’s increase the complexity and represent the following 3D object as an implicit
function. Not possible to handcraft the functions to represent the shape!
Scene Representation Networks, Sitzmann et al. 2019
26. Why Implicit Scene Representations Are Important?
● Infinite resolution for spatial dimensions / No more OOM due large voxel sizes!
● Defines scene as a continuous function (all explicit repr. are discrete)
● Allows defining level of detail during inference, no need to change network & retrain
● Can embed different meta information, along with occupancy
● Possibility to remove expensive 3D CNNs to process the volume
27. ● What is 3D Computer Vision?
● What is Differentiable Rendering (DR)?
● How Differentiable Rendering works?
● Application “Monocular Differentiable Rendering for Self-Supervised Object Detection”
● Other examples
● Open Discussion
Agenda
28. Applications:
Self-Supervision for 3D Object Detection
Monocular Differentiable Rendering for Self-Supervised 3D Object Detection,
D.Beker, H.Kato, M. Morariu, T. Ando, T. Matsuoka, W. Kehl, A. Gaidon, ECCV 2020
Purpose:
Given monocular RGB image (and camera calibrations)
Predict 3D dimension, texture, rotation and location of the objects in metric scale
Through self-supervision, generate 3D annotations automatically
30. ● What is 3D Computer Vision?
● What is Differentiable Rendering (DR)?
● How Differentiable Rendering works?
● Application “Monocular Differentiable Rendering for Self-Supervised Object Detection”
● Other examples
● Open Discussion
Agenda
32. Example - NeRF
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis,
Mildenhall et al. ECCV Oral 2020
These 3D reconstructions are produced
only by using 2D observations.
33. Example - D-NeRF (a.k.a. Nerfies)
D-NeRF: Neural Radiance Fields for Dynamic Scenes
Pumarola et al. 2020
34. Example - DeepSDF
DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation,
Park et. al. 2019
Shape as Signed Distance Function
35. Example - Convolutional Occupancy Networks
Convolutional Occupancy Networks, Peng et. al. 2020 Occupancy prediction at a 3D point,
conditional on input features
36. Example - SIREN
Implicit Neural Representations with Periodic Activation Functions,
Sitzmann et. al, NeurIPS 2020 (Oral)
Replacing the activation layers with
periodic functions improves the
model’s capacity with same number of
parameters
37. There are more algorithms and applications
For other algorithms, please check our survey paper
Differentiable Rendering: A Survey, H. Kato, D. Beker, M. Morariu, T. Ando, T. Matsuoka, W. Kehl, A. Gaidon