海量視覺資料－孫民

海量視覺資料
Big Visual Data
國⽴立清華⼤大學
孫⺠民教授
VSLab

教宗就
職典禮

Pope
Benedict
XVI

Pope
Benedict
XVI

Pope
Benedict
XVI

Pope
Francis

教宗就
職典禮

Pope
Benedict
XVI

Pope
Francis

網路
Internet

How
to
…

•  整理 organize
•  瀏覽 browse
•  搜索 Search
•  and more
Help！

電腦視覺 Computer Vision
Q: 什麼場景？
⼾戶外、湖邊。

Q: 在哪裡？
⽇日⽉月潭、⽔水社碼頭。

Q: 有哪些物件？
船、碼頭、建築。

Q: 建築離多遠？
50 公尺左右。

…

電腦視覺 Computer Vision
從海量視覺資料中學習！

Outline

•  Images

– RecogniDon
辨識

– ReconstrucDon重建

•  Videos

– Surveillance
監視

– SummarizaDon
摘要

Images

Outline

•  Images

– RecogniDon
辨識

– ReconstrucDon重建

•  Videos

– Surveillance
監視

– SummarizaDon
摘要

Images

• Virtual
Data

Ø  3D
CAD
models

Ø  3D
Environment

•  開始於 2007 @ Princeton
•  初登場於 2009 @ CVPR
•  照⽚片停⽌止搜集於 2010
Ø 總共類別：21841
Ø 總共圖⽚片：1千4百萬
•  ILSVR Challenge 從2010到現今
Jia
Deng
Fei-‐Fei
Li

Info
from
h3p://www.image-‐net.org/

1K
Image
ClassiﬁcaDon

Figure
from
Olga
Russakovsky
ECCV'14
workshop

Deep Learning
深度學習

Place-‐Net

•  2014 @ MIT and Princeton
Ø 總共類別：400
Ø 總共圖⽚片：7百萬
#
of
images

Bolei
Zhou
Prof.
Torralba

21841

1千4百萬

22%
classiﬁcaDon
error

Info
from
h3p://places.csail.mit.edu/

Deep
Learning:

ConvoluDonal
Neural
Network
(CNN)

HandwriZen

Character

A

filter
response

filter

response

#filters

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Credit: S. Seitz
0

],[],[],[
,
lnkmflkgnmh
lk
++= ∑
[.,.]h[.,.]f
Image
ﬁltering

1
1
1

1
1
1

1
1
1

],[g ⋅⋅
16

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 10
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image
ﬁltering

1
1
1

1
1
1

1
1
1

],[g ⋅⋅
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
++= ∑ 17

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 10 20
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image
ﬁltering

1
1
1

1
1
1

1
1
1

],[g ⋅⋅
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
++= ∑ 18

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 10 20 30
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image
ﬁltering

1
1
1

1
1
1

1
1
1

],[g ⋅⋅
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
++= ∑ 19

0 10 20 30 30
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image
ﬁltering

1
1
1

1
1
1

1
1
1

],[g ⋅⋅
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
++= ∑ 20

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 10 20 30 30 30 20 10
0 20 40 60 60 60 40 20
0 30 60 90 90 90 60 30
0 30 50 80 80 90 60 30
0 30 50 80 80 90 60 30
0 20 30 50 50 60 40 20
10 20 30 30 30 30 20 10
10 10 10 0 0 0 0 0
[.,.]h[.,.]f
Image
ﬁltering

1
1
1

1
1
1

1
1
1

],[g ⋅⋅
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
++= ∑ 21

Image
ﬁltering

-‐1
0
1

-‐2
0
2

-‐1
0
1

Vertical Edge
(absolute value)
Sobel
h3p://en.wikipedia.org/wiki/Sobel_operator
22

Image
ﬁltering

-‐1
-‐2
-‐1

0
0
0

1
2
1

Horizontal Edge
(absolute value)
Sobel
23

u  How to learn filters
from data?

Visualize
Filters

Slides
from
Yann
LeCun

Horizontal

Edge
Filter

-‐1
-‐2
-‐1

0
0
0

1
2
1

AlexNet,
2012,
Krizhevsky
et
al

2
ConvoluDon
layers

5
ConvoluDon
layers

u  Many Pre-trained Networks
o  Caffe http://caffe.berkeleyvision.org/
o  Torch http://torch.ch/

Product:
Google
Photos

h3ps://photos.google.com/

Oops

h3p://www.bbc.com/news/technology-‐33347866

Brand
RecogniDon
in
Photos

h3p://di3o.us.com/

Large-‐scale
StaDc
Scene

靜態場景
：巴黎聖⺟母院

Structure
from
MoDon
(SfM)

h3p://homes.cs.washington.edu/~shanqi/

3D 瀏覽
Large-‐scale
StaDc
Scene

照⽚片
Dense Reconstruction
Visual Turing Test
h3p://phototour.cs.washington.edu/

h3p://grail.cs.washington.edu/projects/]melapse/

Image
1

Image
2

Image
3

R1,t1
R2,t2
R3,t3
X1
Structure
from
MoDon

X2
X3
X4
X5
X6
X7
x1
1
x1
2
x1
3
Slides
from
Jianxiong
Xiao

Structure
from
MoDon

x1
1
= K R1 t1
! #$X1
x2
1
= K R2 t2
! #$X1
x3
1
= K R3 t3
! #$X1
x1
2
= K R1 t1
! #$X2
x2
2
= K R2 t2
! #$X2
x2
3
= K R2 t2
! #$X3
x3
3
= K R3 t3
! #$X3
Point 1 Point 2 Point 3
Image 1
Image 2
Image 3
Slides
from
Jianxiong
Xiao

Structure
from
MoDon

•  Input:
Observed
2D
image
posi]on

•  Output:

Unknown
Camera
Parameters
(with
some
guess)

Unknown
Point
3D
coordinate
(with
some
guess)

R1 t1
! #$, R2 t2
! #$, R3 t3
! #$
x1
1
x2
1
x3
1
x1
2
x2
22
x2
3
x3
3
X1
,X2
,X3
,
Slides
from
Jianxiong
Xiao

Bundle
Adjustment

A
valid
solu]on

and

must
let

x1
1
x2
1
x3
1
x1
2
x2
22
x2
3
x3
3
R1 t1
! #$, R2 t2
! #$, R3 t3
! #$ X1
,X2
,X3
,
Observation
Re-projection
x1
1
= K R1 t1
! #$X1
x2
1
= K R2 t2
! #$X1
x3
1
= K R3 t3
! #$X1
x1
2
= K R1 t1
! #$X2
x2
2
= K R2 t2
! #$X2
x2
3
= K R2 t2
! #$X3
x3
3
= K R3 t3
! #$X3
=
u  Large-scale Nonlinear Least Square
Solved using ceres solver
http://ceres-solver.org/
Slides
from
Jianxiong
Xiao

Product:
Google
Photo
Tour

Product:
Indoor
3D
Walkthrough

h3p://ma3erport.com/

Videos

•  Surveillance

•  SummarizaDon

Surveillance
Cam
is
Everywhere

h3p://www.dailymail.co.uk/news/ar]cle-‐2488468/Stretch-‐
road-‐SIXTY-‐CCTV-‐cameras.html

Old
Fashion

儲存裝置
資料不⾒見天⽇日也難安裝
相機

New:
Nest
Cam

看家中寵物
即時上網
•  Improve
mo]on/sound

detec]on

•  Auto-‐highlight
summary

h3ps://nest.com/camera/meet-‐nest-‐cam/

超好裝Zero config

Behavior
AnalyDcs

h3p://www.visiosafe.com/

海量⾏行⾞車記錄器
2014年臺灣新⾞車市場42.3萬輛
46

超罕⾒見！
47

•  給警察
•  給記者
•  …

Explosion
of
Personal
Videos

−4
−3
−2
−1
0
1
2
3
4
5
精彩剪輯
Our
predicDon

H-‐factor

Sun
et
al.
ECCV’14

h3ps://www.youtube.com/watch?v=ad8uj3rE3yc

−4
−3
−2
−1
0
1
2
3
4
5
Ranking
H-‐factors

Our
predicDon

H-‐factor

精彩剪輯

Sun
et
al.
ECCV’14

h3ps://www.youtube.com/watch?v=BVbC3QaGdgg

Hyperlapse

h3p://research.microsoi.com/en-‐us/um/redmond/projects/hyperlapseapps/

Core
Techniques

•  Mul]ple
objects
tracking

•  Op]cal
ﬂow

Mo]on
Features

h3p://lear.inrialpes.fr/people/wang/dense_trajectories

h3p://motchallenge.net/

Virtual
Data

•  Objects

•  Environment

Virtual
3D
data

h3ps://3dwarehouse.sketchup.com

•  超多model
•  品質不⼀一
•  沒有align

Stanford
ShapeNet

h3p://shapenet.cs.stanford.edu/

•  Aligned
Ø 總共類別：3135
Ø 總共模型：22萬

Scene-‐Speciﬁc
Pedestrian
Detectors

h3ps://www.youtube.com/watch?v=2Jf7faozHUs

Learning
to
Drive

h3ps://www.youtube.com/watch?v=5hFvoXV9gII

Summary

•  Type
of
Big
visual
data

– Images

– Videos

– Virtual
data

•  Applica]ons

– Search
and
organize:
Google
photo

– Browse
and
visualize:
Ma3erport,
photo
tour

– Visual
Commerce:
Di3o.com,
visiosafe

– Video
summary,
security,
self-‐driving
car,
etc.

Vision
Science
Lab@
NTHU
Taiwan

PI:
Min
Sun

Web:
aliensunmin.github.io

Oﬃce:
Delta
962

Lab:
EECS
Bldg
712

Tel:
+886-‐3-‐5731058

Email:
sunmin@ee.nthu.edu.tw

Goal:

making
great
impact
in

computer
vision,
robot
vision,

mobile
vision,
etc.
We
aim
to

build
game
changing
applica]ons

that
improve
our
daily
life.

Analyzing

Street
Views

Understanding

Personal
Videos

3D

Robot
Vision
Human
Sensing

Research
Topics

Wearable
Camera
ApplicaDons

Make3D

62

海量視覺資料－孫民

Recommended

Recommended

More Related Content

Similar to 海量視覺資料－孫民

Similar to 海量視覺資料－孫民 (20)

More from 台灣資料科學年會

More from 台灣資料科學年會 (20)

Recently uploaded

Recently uploaded (20)

海量視覺資料－孫民