About my work at DeNA Co., Ltd.. We built HD maps with images taken by dash cams for self-driving cars. Deep learning is extensively used for detecting objects on the road and SfM is used for reconstructing 3D points with 2D images. This slide was presented at TechCon 2019 https://techcon.dena.com/2019/
2. #denatechcon
Agenda
• Who I am
• Our Goal
• Intro to DL and SfM
• 3D Point Reconstruction
• Recognizing Objects
• Putting It All Together
3. #denatechcon
Who I am
• Profile
• Kosuke Kuzuoka
• 22 years old
• Experience
• June 2018 - Present
AI Research Engineer at DeNA Co., Ltd.
• March 2017 - June 2018
R&D manager at CONCORE’S, inc.
• Interests
• Self Driving Cars
• Computer Vision
Facebook: https://www.facebook.com/kousuke.kuzuoka.9
LinkedIn: https://www.linkedin.com/in/kousuke-kuzuoka-4101ba160/
4. #denatechcon
What I have done before
Detecting objects from construction
plans using deep learning algorithms
Patent pending algorithm that I
developed for detecting pillars across
multiple tiled images
5. #denatechcon
Our Goal
● To create high definition maps at
a lower price
● 3D point reconstruction and
object detection in dashcam
images
● No use of expensive equipment,
such as LiDAR
https://medium.com/@surmenok/hd-maps-for-self-driving-cars-c41bc01e0d40
6. #denatechcon
Isn’t it like google maps?
● A map designed for humans
● It has useful information for
humans
● A map designed for machines
● It has useful information for cars,
such as where traffic signs exist
7. #denatechcon
Is it for self-driving cars?
● It’s extensively used in self-driving cars,
such as for localization and path planning
● Therefore, the location accuracy for HD
maps need to be within a few centimeters
● A self-driving car needs to know which
direction the lane is leading, where the
traffic signs are, etc.
https://www.youtube.com/watch?time_continue=207&v=EUq5DlPQdhg
8. #denatechcon
Introduction to Deep Learning
● The idea of deep learning has existed from the late 1950s, invented by Frank Rosenblatt.
● It was originally called Perceptron, and it was able to solve linearly separable problems.
● Later, it turned out that simple Perceptron wasn’t able to solve non-linearly separable
problems.
https://becominghuman.ai/deep-learning-made-easy-with-deep-cognition-403fbe445351
9. #denatechcon
Why is deep learning popular nowadays?
● Large scale datasets such as ImageNet have been made public for research purposes
● High computational resources such as GPU are more accessible than ever before
https://en.wikipedia.org/wiki/Nvidia
http://www.image-net.org/
10. #denatechcon
Okay, but what can you do with DL?
● Using deep learning, we can
solve object detection and
instance segmentation
problems
● Object detection detects
multiple objects in the image,
while instance segmentation
segments object boundaries
● Using deep learning, we can
solve image classification and
image localization problems
● Image classification classifies
what is in the image, while
image localization classifies
what and where in the image
https://medium.com/comet-app/review-of-deep-learning-algorithms-for-object-detection-c1f3d437b852
11. #denatechcon
Okay, let’s sum that up
• Deep learning is not new
• Data is important for deep learning
• High computational resources are necessary
• You can do so many things with deep learning
12. #denatechcon
Introduction to SfM
SfM stands for Structure from
Motion, and is an algorithm to
reconstruct 3D points (called
structure) from images taken
with different angles or positions
(called motion). Large scale
applications include for example
reconstructing all of Rome using
only images found on the web.
https://grail.cs.washington.edu/rome/rome_paper.pdf
13. #denatechcon
How does SfM work?
https://www.mathworks.com/help/vision/ug/structure-from-motion.html
● Extracts features from images. e.g.
corners or edges
● Matches the features in images taken
from different positions
● Calculates the corresponding points
in 3D coordinates using triangulation
● Calculates camera position and
optimizes reconstructed 3D points
14. #denatechcon
What can you do with SfM?
https://grail.cs.washington.edu/rome/rome_paper.pdf
It built a 3D representation of Rome within a day with images found on the web. It used
150k images, and the processing time was around 21 hours using 496 CPU cores.
15. #denatechcon
Let’s sum that up
• SfM can reconstruct 3D shapes from 2D images
• 3D representation of Rome can be built in a day
using images from the web
16. #denatechcon
So we have tools. What now?
● Dashcam images are used for reconstructing 3D points by SfM
● The same images are used for detecting objects in 2D space
● Both results are integrated to get 3D representations of each object
17. #denatechcon
3D Point Reconstruction
● Images are taken by driving in the
highlighted region in Minatomirai
● Dashcam images are used for SfM
and object detection
18. #denatechcon
Overall shape looks good
● a
● b
● c
● 3D modeling in relatively small
region in Minatomirai
● Reconstructed shape matches the
highlighted region in the map
19. #denatechcon
Slightly larger region, still good
● Red arrows indicate the direction
the car was driving
● The reconstructed shape matches
the highlighted region in the map
20. #denatechcon
Hooray, view from top is good
● SfM was applied in a larger region
in the Minatomirai area
● Overall shape still matches the map
21. #denatechcon
What about the closer view?
The detail of road markings and speed
limit signs can be found, though some
information is unnecessary
Lanes are reconstructed well on the left
side, but the the center lane markings on
the right are missing. This is caused by
the divider
22. #denatechcon
Some findings with SfM are:
• Reconstructed 3D points contain small details
• GPU can reduce the processing time significantly
• The more images, the better the result
23. #denatechcon
Recognizing Objects
● We chose Faster R-CNN for detecting
traffic signs
● Faster R-CNN was a state-of-the-art
detector in 2016
● Faster R-CNN is a really accurate object
detector when compared to other real-time
detectors, but it’s slower
https://arxiv.org/abs/1506.01497
24. #denatechcon
Objects are detected correctly
● Most of traffic signs are detected correctly, though
there is a small traffic sign missed by the detector
● The network predicts the category for each box,
and there are more than 100 categories to choose
from
26. #denatechcon
What now for lane detection?
https://arxiv.org/abs/1802.05591
● We chose LaneNet published in 2018 as a lane detector
● LaneNet transforms an original image to a bird’s eye image with learned parameters
● It can detect multiple lane instances at real-time speed and high accuracy
27. #denatechcon
Deep learning can detect lanes!
● Different colors indicate different instances
● You can see that the lanes are detected correctly
● It can detect curved lanes as well, though they
aren’t in the image
29. #denatechcon
What about road markings?
Bird’s eye
transformation on
original image
Inverse transformation
on bird’s eye image
Faster R-CNN on
bird’s eye image
30. #denatechcon
Deep learning works for road markings!
● Road markings are detected correctly.
● It distinguishes the lane from the stop sign
● The detection fits objects, though not perfectly
34. #denatechcon
Let’s sum that up
• Traffic sign recognition with more than 100
categories can be solved with deep learning
• Deep learning works well on complicated tasks
such as lane and road marking detection
• The more data, the better the results
35. #denatechcon
Putting It All Together
● Green points indicate the region used for 3D
reconstruction
● The detection has to be done in frames where
the objects are highlighted in green
36. #denatechcon
Results are now integrated
We can get a 3D representation of
detected objects by integrating both
results. The final result will look like
image above.
37. #denatechcon
Now, objects are represented in 3D
● Detected traffic signs and road markings are
converted to 3D
● Each object has a 3D representation after
integrating both SfM and object detection results
38. #denatechcon
We are done!
● Reconstructed 3D view looking from top
● You can see the detected lanes and road
markings now have a 3D representation
39. #denatechcon
Using this technique, we could do:
• Automating process for map creation
• Creating HD maps for other services
• Detecting changes automatically