Introduction to 3D Computer Vision and Differentiable Rendering

Deniz Beker 
Engineer, Preferred Networks 
 
March 2021 
Introduction to 3D Computer Vision:
Applications & Challenges

● What is 3D Computer Vision?
● What is Differentiable Rendering (DR)?
● How Differentiable Rendering Works?
● Application “Monocular Differentiable Rendering for Self-Supervised Object Detection”
● Other examples
● Open Discussion
Agenda

Education:
• Isik University, Istanbul, BSc. Electronics Engineering, 2011
• Yeditepe University, Istanbul, MSc. Electronics Engineering, 2014
Experience:
• Huawei R&D Center, Istanbul, SW Engineer, 2012-2014
• Ayonix Inc, Tokyo, SW Engineer, 2014-2015
• TeraRecon Inc, Tokyo, SW Engineer, 2015-2017
• Preferred Networks, Tokyo, Engineer, 2018 -
• Preferred Networks, Tokyo, Engineer / Engineering Manager, 2019-2021
Freelance:
• Managed & Consulted several projects based on 3D Computer Vision and Cloud Gaming
Introduction - Deniz Beker

What is 3D Computer Vision?
Sequential
Data
Text
1-D Data
Image
2-D Data

What is 3D Computer Vision?
Multi-View 3D Object Detection Network For Autonomous Driving,
Chen et. al., 2017
3D Perception 3D Object Detection
Shape Segmentation & Classiﬁcation
PointNet: Deep Learning on Point Sets for 3D Classiﬁcation and Segmentation, Qi et al. 2017 Convolutional Occupancy Networks, Peng et al, 2020
Shape Completion
&
Denoising
And many other applications...

Supervised Deep Learning Pipeline
1
0
Input Data
Forward Pass 
Calculate Error
Labels /
Supervision
● Penalize each neuron w.r.t. their
contribution to error
● Reiterate

Supervision Challenges in 3D Computer Vision
● It is not easy to label 3D data accurately
(where to click, how to label, it is an open HCI problem)
● Sometimes, it is impossible to generate 3D labels
(i.e. light directions in an image in the wild)
● It is costly to collect 3D data
(Most likely need to equip LIDAR-like equipment for precise data collection)

Self-Supervised Solution: Diﬀerentiable Rendering
● Projective Geometry Aware
Dimensionality Reduction (3D -> 2D)
● Allowing usage of 2D input images as
self-supervision to predict 3D parameters
● Estimating the 3D world parameters only
from 2D observations

3D Shape Representations Types
Point Cloud Mesh Voxel Implicit Function
RGB Rendering
(For Reference)
Source: Semantic and instance segmentation based on 3D point cloud scenes: RandLA-Net and 3D-BoNet
Explicit Representations  Implicit Representation

Rendering: Rasterization in OpenGL pipeline
The aim is to project 3D geometry & scene
information to a 2D plane
● Project each triangle to screen space
● Order each triangle based on depth and
determine which one is visible
● Mark each pixels that lie inside each visible
triangle
● Apply texture to those pixels for color
● Post process (optional)
○ Light
○ Reﬂection & Refraction
○ Shadow
Source:
https://developer.tizen.org/ko/forums/native-application-development/what-rasterisation-opengl-graphic-pipeline

Rendering: Ray Tracing
The aim is to bind each hypothetical line
segments from a pixel to a light source
● For each pixel, shoot a ray and check if it hits
to a light or a surface
● If it hits to a surface, calculate possible
refractions & reﬂections to generate N more
rays
● Follow those rays and repeat the process
until a ray hits to a light
● If any ray hits to a light, calculate the color
by combining light intensity, light color,
material refraction & reﬂection parameters
● Combine the impact of multiple rays if they
affect same pixel
Source: https://en.wikipedia.org/wiki/Ray_tracing_(graphics)

Analytical Derivative of Mesh Rasterization
Mesh Vertices Vp
Vertices @ Screen Space
Order & Choose Visible Triangles
Assign Color To A Pixel Through Texture
Post Process For Light etc.
Not Differentiable Due Discrete Nature
Differentiable By Default
Pixel Color Vc
Rasterize (Digitize)

Diﬀerentiating Mesh Rasterization
There are two main problems to solve to differentiate a rendering:
● How to create gradients / backpropagate for object (foreground) pixels?
● How to create gradients / backpropagate for non-object (background pixels)?
There are two mainstream approaches to handle this problem:
● Approximating Gradients (Backward Pass)
● Approximating Rasterization (Forward Pass)

Approximating Gradients
Opendr: An approximate differentiable renderer, M. M. Loper and M. J. Black. ECCV, 2014.
Neural 3d mesh renderer, H. Kato, Y. Ushiku, and T. Harada. CVPR, 2018.
● Find the visible triangle per pixel pi
● Find the pi’s projection Pi on 3D
triangle
● Using inverse barycentric coordinate
calculation, ﬁnd the weights of
triangle’s vertices Vx at point Pi
● Assign the weighted value of gradient
of Pi to each Vx
● Propagate gradients for background
pixels in screen space from
neighbouring pixels (OpenDR) or same
column / row (NMR)

Approximating Gradients
Opendr: An approximate differentiable renderer, M. M. Loper and M. J. Black. ECCV, 2014.
Neural 3d mesh renderer, H. Kato, Y. Ushiku, and T. Harada. CVPR, 2018.
Advantages:
● Provides “useful” gradients
● Any professional rasterizer can be
used for forward pass
Disadvantages:
● Handcrafting gradients are hard,
complex and time consuming
● Point pi will not propagate any
gradients
● Fails if accurate gradients are required

Approximating Mesh Rasterization
Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning
S. Liu, T. Li, W. Chen, H. Li
Advantages:
● Automatic & Accurate gradient calculation
● Each pixel is covered for gradient
Disadvantages:
● Blurry output
● Computationally expensive

Differentiating point cloud rendering
There is no rasterization => Theoretically all pipeline is differentiable
However, there is still a problem due discrete nature of the images
What will happen if multiple points are projected to the same pixel?
What will be the final color? How to backpropagate?
Learning Efficient Point Cloud Generation for Dense
3D Object Reconstruction
Lin et al.

Diﬀerentiating point cloud rendering
Learning Efﬁcient Point Cloud Generation for Dense
3D Object Reconstruction
Lin et al.
- Increase spatial dimensions to ensure unique projection
- Apply max / mean pooling operation
- Add a pseudo-size to each point
- Apply weighted / probabilistic aggregation for pixel color
End-to-End Learning Local Multi-view Descriptors for 3D Point Clouds
Li et al.

Diﬀerentiating voxel rendering
Ray marching is differentiable by nature!
Differentiable Ray Marching can be
implemented as a cumulative sum of voxel
occupancy values across a ray
However, it has O(n^3) memory complexity
Neural Volumes: Learning Dynamic Renderable Volumes From Images, Lombardi et al.

Diﬀerentiating Ray Tracing
● Functions are differentiable by nature
● Discontinuity exists at object edges and
due occlusions
● Approximation / Sampling strategies to
overcome this problem

Neural Rendering
● Rendering as NN layers
● Requires learning
● Easier to implement vs previous ones
● Geometry consistency is not guaranteed
Deferred Neural Rendering: Image Synthesis using Neural Textures, Theis et al., 2019

Shapes as Implicit Functions
x
y
Consider the aim is to represent this circle in 2D
Mesh / Point Cloud: Sample N points on circle
(N x 2 parameters to represent)
Voxel: Cubify / Discretize to M bins
(M x M parameters to represent)
Implicit Fn: Deﬁne the function & Evaluate for (N) or (M x M) points
(3 parameters to represent, a, b, c)
f(x1
, y1
)
f(x2
, y2
)
f(x3
, y3
)
.
.
.
f(xN
, yN
)
f(x1
, y1
)
f(x2
, y2
)
f(x3
, y3
)
..
.
f(xM
, yM
)

Neural Implicit Scene Representations
Let’s increase the complexity and represent the following 3D object as an implicit
function. Not possible to handcraft the functions to represent the shape!
Scene Representation Networks, Sitzmann et al. 2019

Why Implicit Scene Representations Are Important?
● Infinite resolution for spatial dimensions / No more OOM due large voxel sizes!
● Defines scene as a continuous function (all explicit repr. are discrete)
● Allows defining level of detail during inference, no need to change network & retrain
● Can embed different meta information, along with occupancy
● Possibility to remove expensive 3D CNNs to process the volume

● What is 3D Computer Vision?
● What is Differentiable Rendering (DR)?
● How Differentiable Rendering works?
● Application “Monocular Differentiable Rendering for Self-Supervised Object Detection”
● Other examples
● Open Discussion
Agenda

Applications:
Self-Supervision for 3D Object Detection
Monocular Differentiable Rendering for Self-Supervised 3D Object Detection,
D.Beker, H.Kato, M. Morariu, T. Ando, T. Matsuoka, W. Kehl, A. Gaidon, ECCV 2020
Purpose:
Given monocular RGB image (and camera calibrations)
Predict 3D dimension, texture, rotation and location of the objects in metric scale
Through self-supervision, generate 3D annotations automatically

Applications:
Self-Supervision for 3D Object Detection
Optimization Iterations

Other Applications of Diﬀerentiable Rendering

Example - NeRF
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis,
Mildenhall et al. ECCV Oral 2020
These 3D reconstructions are produced
only by using 2D observations.

Example - D-NeRF (a.k.a. Nerﬁes)
D-NeRF: Neural Radiance Fields for Dynamic Scenes
Pumarola et al. 2020

Example - DeepSDF
DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation,
Park et. al. 2019
Shape as Signed Distance Function

Example - Convolutional Occupancy Networks
Convolutional Occupancy Networks, Peng et. al. 2020 Occupancy prediction at a 3D point,
conditional on input features

Example - SIREN
Implicit Neural Representations with Periodic Activation Functions,
Sitzmann et. al, NeurIPS 2020 (Oral)
Replacing the activation layers with
periodic functions improves the
model’s capacity with same number of
parameters

There are more algorithms and applications
For other algorithms, please check our survey paper
Differentiable Rendering: A Survey, H. Kato, D. Beker, M. Morariu, T. Ando, T. Matsuoka, W. Kehl, A. Gaidon

Introduction to 3D Computer Vision and Differentiable Rendering

Introduction to 3D Computer Vision and Differentiable Rendering

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to 3D Computer Vision and Differentiable Rendering

Similar to Introduction to 3D Computer Vision and Differentiable Rendering (20)

More from Preferred Networks

More from Preferred Networks (20)

Recently uploaded

Recently uploaded (20)

Introduction to 3D Computer Vision and Differentiable Rendering