SlideShare a Scribd company logo
1 of 54
Deep Learning for Articulated
Human Pose Estimation
Thesis Proposal Defence
Wei Yang
wyang@ee.cuhk.edu.hk
Supervisors: Prof. Xiaogang Wang and Prof. Wanli Ouyang
May 25, 2016
Outline
• Introduction
• Methodology
• Experiments
• Future Work
2
Articulated human pose
estimation localizes human
body parts in images or videos.
3
Activity Recognition Game and Animation Clothing Parsing
Applications
4
Challenges
• Articulation
• Foreshortening
• Clothing
• Occlusion
• …
5
Fischler & Elschlager 1973
Felzenszwalb & Huttenlocher 2005
• Pictorial structures
Yang & Ramanan 2011
Traditional Methods
• Unary templates
• Pairwise springs
• Mixture of mini-parts
• Mixtures of each part
• Unary template for each mixture type
• Pairwise springs between mixture types
of two parts
Unable to handle large
variations
(e.g., foreshortening)
6
Deformable Mixture of Parts
Clustering on (∆𝑥, ∆𝑦)
7
Heatmap Regression
Deep Learning Based Methods
(𝑥, 𝑦) Coordinate Regression
• Learning better representations
• Geometric constraints among body parts
are missing in training the DCNN
• Holistic View
• Mapping from images to coordinates
are too difficult to learn
• Inaccurate in high-precision region
[Tompson et al. CVPR’15][Toshev & Szegedy . CVPR’14]
8
CNN
Spatial
constraints
Local evidence is weak
Forward
Backward
Global consistency helps
training
Motivation: Global Pose Consistency Helps in
Learning Better Representation
Forward
Backward
9
Difficulties in Modeling Spatial Constraints
∗
∗
=
=
face
shoulder
s | f
s | s
face to shoulder
shoulder to shoulder
Tompson, Jonathan J., et al. "Joint training of a convolutional network and a graphical model for human pose estimation." NIPS. 2014. 10
shoulder
⨂
Weakly spatial histogram over body part
locations
• Less effective for large variations
Learned by convolutional kernels
• Parameter space is too large hence is
difficult to learn
Graph Models
𝐺 = (𝑉, 𝐸)
Vertices
• Locations and mixture types of
body parts
• Modeled by a front-end CNN
Edges
• Pairwise spatial relationships
between body parts
• Modeled by message passing
layers
11
message passing
12
DCNN
Logarithm
Softmax
Framework
…
…
…
l.shou
neck
r.shou
r.knee
r.ankle
head
…
max
max
max
max
max
max
13
DCNN
Logarithm
Softmax
Framework
…
…
…
l.shou
neck
r.shou
r.knee
r.ankle
head
…
max
max
max
max
max
max
𝐹 𝒍, 𝒕 𝐼; 𝜽, 𝝎 =
𝑖∈𝑉
𝜙(𝒍𝑖, 𝑡𝑖|𝐼; 𝜃)
+
𝑖,𝑗 ∈𝐸
𝜓(𝒍𝑖, 𝒍𝑗, 𝑡𝑖, 𝑡𝑗|𝐼; 𝝎𝑖,𝑗
𝑡 𝑖,𝑡 𝑗
)
14
DCNN
Logarithm
Softmax
Framework
…
…
…
l.shou
neck
r.shou
r.knee
r.ankle
head
…
max
max
max
max
max
max
𝐹 𝒍, 𝒕 𝐼; 𝜽, 𝝎 =
𝑖∈𝑉
𝜙(𝒍𝑖, 𝑡𝑖|𝐼; 𝜃)
+
𝑖,𝑗 ∈𝐸
𝜓(𝒍𝑖, 𝒍𝑗, 𝑡𝑖, 𝑡𝑗|𝐼; 𝝎𝑖,𝑗
𝑡 𝑖,𝑡 𝑗
)
15
𝒍 = 𝒍𝑖 = {(𝑥𝑖, 𝑦𝑖)}: location of part 𝑖
DCNN
Logarithm
Softmax
Framework
…
…
…
l.shou
neck
r.shou
r.knee
r.ankle
head
…
max
max
max
max
max
max
𝐹 𝒍, 𝒕 𝐼; 𝜽, 𝝎 =
𝑖∈𝑉
𝜙(𝒍𝑖, 𝑡𝑖|𝐼; 𝜃)
+
𝑖,𝑗 ∈𝐸
𝜓(𝒍𝑖, 𝒍𝑗, 𝑡𝑖, 𝑡𝑗|𝐼; 𝝎𝑖,𝑗
𝑡 𝑖,𝑡 𝑗
)
16
𝒕 = 𝒕𝑖 : mixture type of part 𝑖
DCNN
Logarithm
Softmax
Front-End CNN
Part Appearance Terms
…
…
…
l.shou
neck
r.shou
r.knee
r.ankle
head
…
max
max
max
max
max
max
𝐹 𝒍, 𝒕 𝐼; 𝜽, 𝝎 =
𝑖∈𝑉
𝜙(𝒍𝑖, 𝑡𝑖|𝐼; 𝜃)
+
𝑖,𝑗 ∈𝐸
𝜓(𝒍𝑖, 𝒍𝑗, 𝑡𝑖, 𝑡𝑗|𝐼; 𝝎𝑖,𝑗
𝑡 𝑖,𝑡 𝑗
)
17
Message Passing Layers
Spatial Relationship Terms
DCNN
Logarithm
Softmax
…
…
…
l.shou
neck
r.shou
r.knee
r.ankle
head
…
max
max
max
max
max
max
𝐹 𝒍, 𝒕 𝐼; 𝜽, 𝝎 =
𝑖∈𝑉
𝜙(𝒍𝑖, 𝑡𝑖|𝐼; 𝜃)
+
𝑖,𝑗 ∈𝐸
𝜓(𝒍𝑖, 𝒍𝑗, 𝑡𝑖, 𝑡𝑗|𝐼; 𝝎𝑖,𝑗
𝑡 𝑖,𝑡 𝑗
)
18
DCNN
Front end CNN DCNN
19
conv
+
norm
+
pool
conv
+
norm
+
pool
conv
conv
conv
1x1
conv
+
dropo
ut
1x1conv
Dropout
Dropout
3
3 3 3 3 3
1x1conv
4096
4096
PxK
…
DCNN
Local confidence of the appearance of part 𝑖 with mixture type 𝑡𝑖
• 𝑓 𝒍𝑖, 𝑡𝑖 𝐼; 𝜃 : output of the front-end CNN
• 𝜎(⋅): Softmax function
𝜙 𝒍𝑖, 𝑡𝑖 𝐼; 𝜃 = log 𝑝 𝒍𝑖, 𝑡𝑖 𝐼; 𝜃 = log 𝜎(𝑓 𝒍𝑖, 𝑡𝑖 𝐼; 𝜃 )
20
Front end CNN
conv
+
norm
+
pool
conv
+
norm
+
pool
conv
conv
conv
1x1
conv
+
dropo
ut
1x1conv
Dropout
Dropout
3
3 3 3 3 3
1x1conv
4096
4096
PxK
…
DCNN
Local confidence of the appearance of part 𝑖 with mixture type 𝑡𝑖
• 𝑓 𝒍𝑖, 𝑡𝑖 𝐼; 𝜃 : output of the front-end CNN
• 𝜎(⋅): Softmax function
𝜙 𝒍𝑖, 𝑡𝑖 𝐼; 𝜃 = log 𝑝 𝒍𝑖, 𝑡𝑖 𝐼; 𝜃 = log 𝜎(𝑓 𝒍𝑖, 𝑡𝑖 𝐼; 𝜃 )
21
Front end CNN
conv
+
norm
+
pool
conv
+
norm
+
pool
conv
conv
conv
1x1
conv
+
dropo
ut
1x1conv
Dropout
Dropout
3
3 3 3 3 3
1x1conv
4096
4096
PxK
…
DCNN
Local confidence of the appearance of part 𝑖 with mixture type 𝑡𝑖
• 𝑓 𝒍𝑖, 𝑡𝑖 𝐼; 𝜃 : output of the front-end CNN
• 𝜎(⋅): Softmax function
𝜙 𝒍𝑖, 𝑡𝑖 𝐼; 𝜃 = log 𝑝 𝒍𝑖, 𝑡𝑖 𝐼; 𝜃 = log 𝜎(𝑓 𝑙𝑖, 𝑡𝑖 𝐼; 𝜃 )
22
Front end CNN
conv
+
norm
+
pool
conv
+
norm
+
pool
conv
conv
conv
1x1
conv
+
dropo
ut
1x1conv
Dropout
Dropout
3
3 3 3 3 3
1x1conv
4096
4096
PxK
…
MixtureofParts
DCNN
23
Front end CNN
Spatial Relationship Terms
• Quadratic deformation constraints
• 𝑑 𝒍𝑖, 𝒍𝑗 = ∆𝑥, ∆𝑥2
, ∆𝑦, ∆𝑦2
• ∆𝑥 = 𝑥𝑖 − 𝑥𝑗, ∆𝑦 = 𝑦𝑖 − 𝑦𝑗
• 𝝎 encodes both the rest position and
rigidity of the edge
24
𝜓 𝒍𝑖, 𝒍𝑗, 𝑡𝑖, 𝑡𝑗 𝐼; 𝝎𝑖,𝑗
𝑡 𝑖,𝑡 𝑗
= 𝝎𝑖,𝑗
𝑡 𝑖,𝑡 𝑗
, 𝑑(𝒍𝑖, 𝒍𝑗)
Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. CVPR, 2011.
𝑢𝑖 𝒍𝑖, 𝑡𝑖 ← 𝛼 𝑢 𝜙 𝒍𝑖, 𝑡𝑖 +
𝑘∈𝑁(𝑖)
𝑚 𝑘𝑖(𝒍𝑖, 𝑡𝑖)The belief of part 𝑖
Message Passing: Max-Sum Algorithm
25
𝑚𝑖𝑗 𝒍𝑗, 𝑡𝑗 ← 𝛼 𝑚 max
𝒍 𝑖,𝑡 𝑖
𝑢𝑖 𝒍𝑖, 𝑡𝑖 + 𝜓 𝒍𝑖, 𝒍𝑗, 𝑡𝑖, 𝑡𝑗The message passed from part 𝑖 to 𝑗
Message Passing: Max-Sum Algorithm
26
𝑚𝑖𝑗 𝒍𝑗, 𝑡𝑗 ← 𝛼 𝑚 max
𝒍 𝑖,𝑡 𝑖
𝑢𝑖 𝒍𝑖, 𝑡𝑖 + 𝜓 𝒍𝑖, 𝒍𝑗, 𝑡𝑖, 𝑡𝑗
𝑢𝑖 𝒍𝑖, 𝑡𝑖 ← 𝛼 𝑢 𝜙 𝒍𝑖, 𝑡𝑖 +
𝑘∈𝑁(𝑖)
𝑚 𝑘𝑖(𝒍𝑖, 𝑡𝑖)
Message Passing: Max Score Assignment
27
𝑚𝑖𝑗 𝒍𝑗, 𝑡𝑗 ← 𝛼 𝑚 max
𝒍 𝑖,𝑡 𝑖
𝑢𝑖 𝒍𝑖, 𝑡𝑖 + 𝜓 𝒍𝑖, 𝒍𝑗, 𝑡𝑖, 𝑡𝑗
𝑢𝑖 𝒍𝑖, 𝑡𝑖 ← 𝛼 𝑢 𝜙 𝒍𝑖, 𝑡𝑖 +
𝑘∈𝑁(𝑖)
𝑚 𝑘𝑖(𝒍𝑖, 𝑡𝑖)
𝒍𝑖
∗
, 𝑡𝑖
∗
= argmax 𝑢𝑖
∗
𝒍𝑖, 𝑡𝑖
𝒍 𝑖,𝑡 𝑖
The Message Passing Layers
28
1st
The Message Passing Layers
29
2nd
The Message Passing Layers
30
3rd
Datasets
LSP
31
Image ParseFLIC
Sports
1000 training
1000 testing
Films
3987 training
1016 testing
Activities
205 testing
Cross-dataset validation
Evaluation Metrics
Percentage of Correct Parts (PCP)
• Correctly localized body parts
• A candidate body part is treated as correct if its
segment endpoints lie within 50% of the length
of the ground-truth annotated endpoints.
• Penalize short limbs
Percentage of Detected Joints (PDJ)
• Correctly localized joints invariant to scale
• Curve computed by varying localization precision
precision threshold, which is normalized by the
scale defined as distance between left shoulder
and right hip
32
33
Results on the LSP Dataset
34
84.1
77.1
52.5
35.9
69.5
65.6
60.8
87.4
77.4
54.4
33.7
75.7
68
62.8
80.1
56.5
37.4
74.3
69.3
64.3
84.3
78.3
54.1
74.5
67.6
61.2
88.1
80.4
62.8
39.5
79
73.6
67.8
88.6
84.3
61.9
45.4
77.8
71.9
68.7
88.7
85.1
61.8
45
78.9
73.2
69.2
56
38
77
71
92.7
87.8
69.2
55.4
82.9
77
75
96.5
83.1
78.8
66.7
88.7
81.7
81.1
TORSO HEAD UPPER ARMS LOWER ARMS UPPER LEGS LOWER LEGS MEAN
STRICT PCP ON THE LSP DATASET
Yang&Ramanan, CVPR'11 Pishchulin et al., CVPR'13 Eichner&Ferrari, ACCV'13 Kiefel&Gehler, ECCV'14 Pose Machines, ECCV'14
Ouyang et al., CVPR'14 Pishchulin et al., ICCV'13 DeepPose, CVPR'14 Chen&Yuille, NIPS'14 Ours
PCP on the LSP dataset
PDJ Curve
The LSP dataset
35
Comparison on the FLIC dataset
36
 Our approach (1st row)
 Chen and Yuille, CVPR 2014 (2nd row)
 Tompson et al. NIPS 2014
PDJ Curve
The FLIC dataset
37
38
Generalization
30 40 50 60 70 80 90 100
TORSO
HEAD
UPPER ARMS
LOWER ARMS
UPPER LEGS
LOWER LEGS
MEAN
Ours Ouyang et al., CVPR'14
Yang&Ramanan, TPAMI'13 Pishchulin et al., ICCV'13
Pishchulin et al., CVPR'13 Pishchulin et al., CVPR'12
Johnson&Everingham, CVPR'11 Yang&Ramanan, CVPR'11
Failure Cases
39
Component Analysis
Unary Term vs. Full Model
Tree Model vs. Loopy Model
40
Unary Term vs. Full Model
83.4
69
53.5
34.9
72.2
63.5
60.1
96.5
83.1
78.8
66.7
88.7
81.7
81.1
30
40
50
60
70
80
90
100
TORSO HEAD UPPER ARMS LOWER ARMS UPPER LEGS LOWER LEGS MEAN
STRCT PCP ON THE LSP DATASET (VGG-LG)
Unary Full Model
41
Tree-Structured Model vs. Loopy Model
42
96.2
83.4
78.7
65.8
87.9
81.1
80.7
96.5
83.1
78.8
66.7
88.7
81.7
81.1
62 67 72 77 82 87 92 97
Torso
Head
Upper Arms
Lower Arms
Upper Legs
Lower Legs
Mean
Loopy Model Tree Model
Future work
Deep Residual Learning for Human Pose Estimation
Image Dependent Graph Structure Learning
43
The deeper, the better?
• Simply stacking layers leads higher training error
0 1 2 3 4 5 6
0
10
20
iter. (1e4)
56-layerr
20-layer
0 1 2 3 4 5 6
0
10
20
iter. (1e4)
56-layer
20-layer
Training error Testing error
44
Residual Learning:
Intuition
• A deeper model should not
have higher training error
than its shallower
counterpart.
• One solution:
identity mapping
Identity mapping
45
Plain Network
• 𝐻(𝐱) is the underlying mapping
• Expect stacked two layers to
approximate 𝐻(𝐱) Weight layer
Weight layer
𝐻(𝐱)
ReLU
ReLU
𝐱
46
Residual Learning
• Explicitly fit a residual mapping
𝐹 𝐱 = 𝐻 𝐱 − 𝐱
Weight layer
Weight layer
ReLU
ReLU
𝐱
Insight:
Finding optimal around
zero is easier! 𝐹 𝐱
𝐻 𝐱 = 𝐹 𝐱 + 𝐱
+
𝐻(𝐱)
47
ResNet vs. VGG
95.6
83.9
72.2
61.8
78.5
71.8
74.8
94.8
90.6
73.9
63.3
81.9
71.8
76.7
60
65
70
75
80
85
90
95
100
torso head u.arm l.arm u.leg l.leg Mean
PCP on the LSP dataset
VGG-LG ResNet
1x1 conv
1x1 conv
scoremap
1x1 conv
1x1 conv
scoremap
1x1 conv
1x1 conv
scoremap
…
2xpooling
7x7conv
image
48
Image Dependent Graph Structure Learning
49
…
𝒘𝑗𝑖
(𝑙+1)
𝒙𝑖
(𝑙)
𝒚 𝑗
(𝑙+1)
…
𝒚 𝑗
(𝑙+1)
=
𝒊
𝐺𝑗𝑖 𝒘𝑗𝑖
(𝑙+1)
∗ 𝒙𝑖
(𝑙)
+ 𝑏𝑗
(𝑙+1)
Image Dependent Graph Structure Learning
50
𝐺𝑗𝑖
…
𝒘𝑗𝑖
(𝑙+1)
𝒙𝑖
(𝑙)
𝒚 𝑗
(𝑙+1)
…
𝒚 𝑗
(𝑙+1)
=
𝒊
𝐺𝑗𝑖 𝒘𝑗𝑖
(𝑙+1)
∗ 𝒙𝑖
(𝑙)
+ 𝑏𝑗
(𝑙+1)
Thank you.
Deep Learning for Articulated Human Pose Estimation
Wei Yang
wyang@ee.cuhk.edu.hk
Supervisors: Prof. Xiaogang Wang and Prof. Wanli Ouyang
Committee
Prof. Xiaogang Wang (EE)
Prof. Wai-kuen Cham (EE)
Prof. Dahua Lin (IE)
Appendix: Number of Message Passing Layers
80.7
80.9
81.1
MEAN
52
80.7
81.2
81.7
LOWER LEGS
87.9
88.3
88.7
UPPER LEGS
66.3
66.3
66.7
LOWER ARMS
78.4
78.2
78.8
UPPER ARMS
1st Layer 2nd Layer 3rd Layer
Appendix: Independent Training vs. Joint
Training
93
82.1
70.6
55.4
82.1
75.3
74.2
95
83.5
75
61.9
86.9
79.8
78.6
30
40
50
60
70
80
90
100
TORSO HEAD UPPER ARMS LOWER ARMS UPPER LEGS LOWER LEGS MEAN
Independent Joint
53
Appendix: Context Cues are Essential
2015/9/11 54

More Related Content

What's hot

Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution OverviewLEE HOSEONG
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Universitat Politècnica de Catalunya
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
 
Human pose estimation with deep learning
Human pose estimation with deep learningHuman pose estimation with deep learning
Human pose estimation with deep learningengiyad95
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
 
Meshing for computer graphics
Meshing for computer graphicsMeshing for computer graphics
Meshing for computer graphicsBruno Levy
 
Nonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problemNonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problemMichele Filannino
 
Image Caption Generation using Convolutional Neural Network and LSTM
Image Caption Generation using Convolutional Neural Network and LSTMImage Caption Generation using Convolutional Neural Network and LSTM
Image Caption Generation using Convolutional Neural Network and LSTMOmkar Reddy
 
Lecture 13 (Usage of Fourier transform in image processing)
Lecture 13 (Usage of Fourier transform in image processing)Lecture 13 (Usage of Fourier transform in image processing)
Lecture 13 (Usage of Fourier transform in image processing)VARUN KUMAR
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 
Wave-Based Non-Line-of-Sight Imaging Using Fast f–k Migration | SIGGRAPH 2019
Wave-Based Non-Line-of-Sight Imaging Using Fast f–k Migration | SIGGRAPH 2019Wave-Based Non-Line-of-Sight Imaging Using Fast f–k Migration | SIGGRAPH 2019
Wave-Based Non-Line-of-Sight Imaging Using Fast f–k Migration | SIGGRAPH 2019David Lindell
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers leopauly
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GANJunho Cho
 

What's hot (20)

Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Human pose estimation with deep learning
Human pose estimation with deep learningHuman pose estimation with deep learning
Human pose estimation with deep learning
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
 
Meshing for computer graphics
Meshing for computer graphicsMeshing for computer graphics
Meshing for computer graphics
 
Nonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problemNonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problem
 
Image Caption Generation using Convolutional Neural Network and LSTM
Image Caption Generation using Convolutional Neural Network and LSTMImage Caption Generation using Convolutional Neural Network and LSTM
Image Caption Generation using Convolutional Neural Network and LSTM
 
Lecture 13 (Usage of Fourier transform in image processing)
Lecture 13 (Usage of Fourier transform in image processing)Lecture 13 (Usage of Fourier transform in image processing)
Lecture 13 (Usage of Fourier transform in image processing)
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
Wave-Based Non-Line-of-Sight Imaging Using Fast f–k Migration | SIGGRAPH 2019
Wave-Based Non-Line-of-Sight Imaging Using Fast f–k Migration | SIGGRAPH 2019Wave-Based Non-Line-of-Sight Imaging Using Fast f–k Migration | SIGGRAPH 2019
Wave-Based Non-Line-of-Sight Imaging Using Fast f–k Migration | SIGGRAPH 2019
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
Style gan
Style ganStyle gan
Style gan
 
Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GAN
 

Viewers also liked

Pose Machine
Pose MachinePose Machine
Pose MachineWei Yang
 
Deformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksDeformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksWei Yang
 
Deep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageDeep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageWei Yang
 
Manifold learning
Manifold learningManifold learning
Manifold learningWei Yang
 
20 Instagram Pics that will Have You Wanting to Visit the Grand Canyon Just ...
20 Instagram Pics that will Have You Wanting to Visit the  Grand Canyon Just ...20 Instagram Pics that will Have You Wanting to Visit the  Grand Canyon Just ...
20 Instagram Pics that will Have You Wanting to Visit the Grand Canyon Just ...Grand Canyon Visitor Center
 
Docking Pose Assessment: The importance of keeping your GARD up
Docking Pose Assessment: The importance of keeping your GARD upDocking Pose Assessment: The importance of keeping your GARD up
Docking Pose Assessment: The importance of keeping your GARD upDavid Thompson
 
All pose face alignment robust to occlusion
All pose face alignment robust to occlusionAll pose face alignment robust to occlusion
All pose face alignment robust to occlusionJongju Shin
 
How to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering ConferencesHow to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering ConferencesAlex Orso
 
Single person pose recognition and tracking
Single person pose recognition and trackingSingle person pose recognition and tracking
Single person pose recognition and trackingJavier_Barbadillo
 
Efficient Running with Pose Method
Efficient Running with Pose MethodEfficient Running with Pose Method
Efficient Running with Pose Methodsuzyhgoodwin
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
 
Convolutional Pose Machines
Convolutional Pose MachinesConvolutional Pose Machines
Convolutional Pose MachinesTakanori Ogata
 

Viewers also liked (15)

Pose Machine
Pose MachinePose Machine
Pose Machine
 
Deformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksDeformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural Networks
 
Deep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageDeep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single image
 
Manifold learning
Manifold learningManifold learning
Manifold learning
 
Monocular Human Pose Estimation with Bayesian Networks
Monocular Human Pose Estimation with Bayesian NetworksMonocular Human Pose Estimation with Bayesian Networks
Monocular Human Pose Estimation with Bayesian Networks
 
20 Instagram Pics that will Have You Wanting to Visit the Grand Canyon Just ...
20 Instagram Pics that will Have You Wanting to Visit the  Grand Canyon Just ...20 Instagram Pics that will Have You Wanting to Visit the  Grand Canyon Just ...
20 Instagram Pics that will Have You Wanting to Visit the Grand Canyon Just ...
 
Docking Pose Assessment: The importance of keeping your GARD up
Docking Pose Assessment: The importance of keeping your GARD upDocking Pose Assessment: The importance of keeping your GARD up
Docking Pose Assessment: The importance of keeping your GARD up
 
All pose face alignment robust to occlusion
All pose face alignment robust to occlusionAll pose face alignment robust to occlusion
All pose face alignment robust to occlusion
 
Towards the Extended Pose
Towards the Extended PoseTowards the Extended Pose
Towards the Extended Pose
 
How to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering ConferencesHow to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering Conferences
 
Single person pose recognition and tracking
Single person pose recognition and trackingSingle person pose recognition and tracking
Single person pose recognition and tracking
 
Efficient Running with Pose Method
Efficient Running with Pose MethodEfficient Running with Pose Method
Efficient Running with Pose Method
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
 
Pose
PosePose
Pose
 
Convolutional Pose Machines
Convolutional Pose MachinesConvolutional Pose Machines
Convolutional Pose Machines
 

Similar to Deep Learning Human Pose Estimation Thesis

Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptxssuser7807522
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171Yaxin Liu
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종WooSung Choi
 
Mathematical Understanding in Traffic Flow Modelling
Mathematical Understanding in Traffic Flow ModellingMathematical Understanding in Traffic Flow Modelling
Mathematical Understanding in Traffic Flow ModellingNikhil Chandra Sarkar
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombeMatt Challacombe
 
Human action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptorHuman action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptorSoma Boubou
 
Tutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptxTutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptxJulián Tachella
 
Paper Introduction "Density-aware person detection and tracking in crowds"
Paper Introduction "Density-aware person detection and tracking in crowds"Paper Introduction "Density-aware person detection and tracking in crowds"
Paper Introduction "Density-aware person detection and tracking in crowds"壮 八幡
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingHsing-chuan Hsieh
 
Analysis of large scale spiking networks dynamics with spatio-temporal constr...
Analysis of large scale spiking networks dynamics with spatio-temporal constr...Analysis of large scale spiking networks dynamics with spatio-temporal constr...
Analysis of large scale spiking networks dynamics with spatio-temporal constr...Hassan Nasser
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionEun Ji Lee
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matchingtaeseon ryu
 
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...Chris Rackauckas
 
Artificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part IArtificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part IRamez Abdalla, M.Sc
 
Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...IJECEIAES
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
 

Similar to Deep Learning Human Pose Estimation Thesis (20)

Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
 
Mathematical Understanding in Traffic Flow Modelling
Mathematical Understanding in Traffic Flow ModellingMathematical Understanding in Traffic Flow Modelling
Mathematical Understanding in Traffic Flow Modelling
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombe
 
Human action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptorHuman action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptor
 
Tutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptxTutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptx
 
Paper Introduction "Density-aware person detection and tracking in crowds"
Paper Introduction "Density-aware person detection and tracking in crowds"Paper Introduction "Density-aware person detection and tracking in crowds"
Paper Introduction "Density-aware person detection and tracking in crowds"
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
 
Analysis of large scale spiking networks dynamics with spatio-temporal constr...
Analysis of large scale spiking networks dynamics with spatio-temporal constr...Analysis of large scale spiking networks dynamics with spatio-temporal constr...
Analysis of large scale spiking networks dynamics with spatio-temporal constr...
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
 
Smart Room Gesture Control
Smart Room Gesture ControlSmart Room Gesture Control
Smart Room Gesture Control
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
 
Icra 17
Icra 17Icra 17
Icra 17
 
Artificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part IArtificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part I
 
Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 

Recently uploaded

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 

Recently uploaded (20)

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 

Deep Learning Human Pose Estimation Thesis

Editor's Notes

  1. Good afternoon. Welcome to my thesis proposal defence. I’m Wei Yang from the IVP group. The title of this talk is deep learning for articulated human pose estimation.
  2. So the first question is that: what is articulated human pose estimation? Given an image or a video, the goal of articulated pose estimation is to recover the joint positions of articulated limbs of human body, as shown in this image.
  3. Applications of articulated human pose estimation is very broad. From recognizing activities to interactive game systems, and from creating movies to clothing recognition, human pose estimation is a very useful information to help solve the problems or to make the original problems easier.
  4. However, the pose estimation problem itself is not a trivial task. Human limbs are highly articulated and flexible, hence a people can appear with a variety of poses and body shape. Meanwhile, different view points lead to different body shape or foreshortening. Various clothing also lead to various appearance of human body. All these factors make the problem more difficult.
  5. To solve the problem, earlier methods adopt part based models, which divide the human body into a set of body parts, such as the head, torso, arms, and legs. In 3D space, these parts can be modeled as cylinders. Later works, such as Pictorial structures, use 2 dimensional part templates, and encode the spatial relationships among different body parts by using springs (or the edges). However, capturing the whole range of appearances using pictorial structures is still quite difficult. Take this picture as an example, A big problem is that even projections of a simple cylinder into 2D yields many different appearances. So one usually has to explicitly evaluate many different possible in-plane orientations and foreshortenings in order to find a good match for a part template. To better handle the large variations, the mixture of mini parts model has been proposed. Each part is clustered into several mixtures according to its appearance. And each mixture has its own unary template for detection. For example, in this image the mini-parts are tuned to represent near-vertical and near horizontal limbs. to approximate the transformations
  6. In implementation, the mixture of parts is obtained by clustering the relative locations of two neighboring body parts. We can see that the samples from the same cluster share similar visual appearance.
  7. Recently, the state-of-the-art performance on pose estimation are achieved by deep learning methods. Deeppose [26] estimates the (x, y) locations of the body part by a regressor in a holistic manner. The regressor is based on the deep convolutional neural networks, and its expressive power is strong. However, the mapping from raw images to (x, y) coordinates are too difficult to learn, hence this method suffers from inaccuracy in the high-precision region. CNN-based heatmap regression models have shown the potential of learning better representations. However, geometric constraints between body parts are usually missing in the training stage. As a concequence, during training stage, these kind of methods may produce imperfect heat maps during training.
  8. For example, these methods may produce many high response regions to the head of unannotated people, and the error will be backpropagated to update the model parameters. However, this is inappropriate. Since the local evidence is weak, we should consider the global consistency of the whole human body. This could be done by considering the geometric relationships between body parts during the training stage.
  9. A natural way to model spatial constraints is to use convolutions. Once the spatial kernels have been learned, one can use these kernels to enforce the global pose consistency. There kernels can be calculated by creating a histogram of joint 𝑎 locations over the training set, given that the adjacent joint b is located at the kernel center. These kernels can also be learned by using the standard backpropagation algorithm. However, there are two limitations of this method. First, these kernels are difficult to handle large variations, especially for the highly articulated parts such as arm and legs. Second, the kernels should large enough to cover sufficient context. Hence the parameter space is very large and the parameters are difficult to learn.
  10. In this proposal, we propose to incorporate the CNN and the expressive mixture of parts model into an end-to-end framework. This enables us to predict the body part locations with the consideration of global pose configurations during the training stage. We formulate the human pose estimation problem by using a graph model G=(V, E). V denotes the vertices, which specify the positions and the mixture types of body parts. The vertices are modeled by a front-end CNN in our framework. The edges model the pairwise spatial relationships between body parts. a node sends a message to each of its neighbors and receives messages from each neighbor (indicated by arrows).
  11. Here is an illustration of the proposed framework. It can be viewed as two components: a front-end DCNN for learning feature representations of body parts, which followed by a softmax layer and a logarithm layer. The second component is the message passing layers for conducting inference and learning on mixture of parts with deformation constraints between parts. Specifically, each message passing layer performs one iteration of message passing algorithm in a forward pass. Finally, the final score map of each body part is computed by compute the maximum value over mixture types.
  12. Given an image image I. the full score of a pose configuration is as this equation.
  13. l is the (x, y) location of each part
  14. T is the mixture type of each part i
  15. The full score consists of the unary term and the pairwise term. The unary term is to model the part appearance, which is denoted by phi. The parameter theta is learned by the front-end CNN followed by a softmax layer and a logarithm layer.
  16. The pairwise terms model the spatial relationships between body parts. we use standard quadratic deformation constraints to model this term, which will be discussed later.
  17. We will first discuss the front-end CNN of our framework. It is a fully convolutional network. Given an input image, the output of the network are scoremaps for mixture types. Note that the front-end CNN does not take the global pose consistency into consideration, hence unary term may contain lots of false positives. The mathematical formulation of the unary term is written as this equation. F denote the raw score of each mixture type predicted by the front-end CNN. Then the following softmax layer compute the normalized score of each mixture type. Then the logarithm layer transform the normalized score into the log space.
  18. To make the training easier and faster. We first pretrain the front-end CNN with image patches. Suppose we have p parts, and each part is clustered into K mixture types, Then an arbitrary image patch is either the background, or belongs to one of the PXK classes. Then given a training image patch, the network predicts a label out of PxK + 1 classes. As mentioned before, the mixtures are obtained by performing clustering on the relative locations of neighboring body parts.
  19. The second term consists of a deformation model that evaluates the relative locations of pairs of parts. We write psi for the squared offset between two part locations, and we write beta for the parameters of a spring that favors certain offsets over others. Beta encodes both the rest position and rigidity of the spring. In a Gaussian model, this would be the mean and covariance.
  20. We employ the Max-sum algorithm to infer the best configuration in graphical models. Although the max-sum algorithm is only an approximation and the convergence cannot be guaranteed on loopy structures, it still provided excellent experimental results. At each iteration, a vertex sends a message to its neighbors and receives messages from its neighbors. We denote mij(lj ; tj) as the message sent from part i to part j, and ui(li; ti) as the belief of part i, then the max-sum algorithm updates the messages and beliefs by these two equations.
  21. This process iterates several times until convergence. And then we are able to obtain the max-sum assignment by compute the argmax of ui.
  22. This process iterates several times until convergence. And then we are able to obtain the max-sum assignment by compute the argmax of ui.
  23. Here are two examples demonstrate the results produced by different message passing layers. We can see that the results are getting better when we increase the number of message passing layers. It is not difficult to understand this phenomenon. Intuitively, a part could receive messages from further parts as the number of message passing layer increases, which may result in better pose estimations.
  24. We demonstrate the effective of the proposed method on three widely used public datasets. The first one is the LSP dataset, namely the LEEDS sports dataset, it consists of 1000 training images and 1000 testing images from sports activities with challenging articulations. The second dataset is the Frames Labeled in Cinema dataset, namely the FLIC dataset. This dataset is collected from popular Hollywood movies with diverse appearances and poses. Each person is annotated by 10 upper-body joints. It consists of about 4000 training and 1016 testing images. The third dataset is the Image Parse dataset which contains diverse activities. We did not train on this dataset. It only used for cross-dataset validation to evaluate the generalization ability of the proposed method.
  25. We adopt two widely used evaluation metrics for evaluation. The first one is the Percentage of Correct Parts (PCP). It measures the rate of correctly detected limbs: a limb is considered as correctly detected if the distances between the detected limb endpoints and groundtruth limb endpoints are within half of the limb length. However, this metric penalize very short limbs. Hence the adopt the Percentage of Detected Joints as the complementary evaluation metric. This metric measures the rate of correctly localized joints, and it is invariant to scale. It computes a curve by varying localization precision threshold.
  26. Some results on the LSP dataset are visualized in this slide. The proposed method is robust to highly articulated poses with variant orientation, foreshortening, cluttered background, occlusion, and overlapping people.
  27. We report the PCP results on the LSP dataset on six limbs: torso, head, upper arms, lower arms, upper legs and lower legs. The cyan bar denote our method. We can see that our method can get the highest PCP value in average and on most of the limbs compared with the previous methods. We can also find that the most difficult body parts are the lower arms. Because lower arms are the body parts with the largest articulations.
  28. We also demonstrate the PDJ curve on the LSP dataset on four body joints, namely the elbows, wrists, knees, and ankles. The red curve denote our method. By comparing the PDJ value at the threshold 0.2, our method outperforms the previous methods by a large margin on all body parts except ankles.
  29. In this slide, we show some sample results on the FLIC dataset. Compared with previous method, our method is robust to large appearance variation and overlapping people, for example, existing methods have difficulty to accurately locate the body part for the man in the costume. However, our method is able to handle this case.
  30. From the PDJ curve, we can also show that our method has some improvement compared with previous methods.
  31. To demonstrate the generalization ability, we directly used the full-body model trained on the LSP dataset to predict the poses on the test images in the image parse dataset. The visualized results are pretty satisfactory. The PCP results are also reported. The proposed method achieve better or comparable results with the state-of-the-art methods. Note that most of the previous methods used the training data from the image parse dataset to train the model.
  32. Some failure cases are showed. Our method may lead to wrong estimations due to significant occlusions, ambiguous background, or heavily overlapping persons.
  33. To evaluate the improvement brought by spatial constraints and joint learning, we compare the unary term with the full model. We found that the spatial constraints and the joint learning boost the performance by about 20 percent.
  34. Our framework is flexible for both the tree-structured model and the loopy graph models. By following the previous work, we add symmetry constraints between left and right knees. We find that this constraint is very helpful for reducing the double counting problem in legs.
  35. In future work, we plan to extend the proposed framework in two directions. First, we could use the deeper and more powerful network architecture to boost the performance. And currently, the graph structure is hand crafted, and may not be the optimal structure for every image. We want to learn the graph structure.
  36. The depth of the network grows rapid in recent years. And generally, we find that the deeper the network, the better the performance. But is there a limitation? Through experiment, people find that the deeper network may produce higher training error when compared to its shallower counterpart. There are several reasons. First is the notorious gradient vanishing or exploding problem. Moreover, current solvers such as Stochastic gradient descent is difficult to find the optimal mappings in the very deep network.
  37. However, we find the a deeper model should not have higher training error than its shallower counterpart. For example, if the stacked layer are identity mapping, then the training error will not increase no matter how many layers are stacked. This is the basic idea of residual learning.
  38. Let’s call the conventional network as the plain network. And H(x) is the underlying mapping. We hope to approximante the underlying mapping Hx by stacking of two layers. And we know it is difficult.
  39. But how about learning the residual of Hx and x? Because find optimal around zeror is much easier. Hence we can fit a residual mapping explicitly. One building block is like this.
  40. We stack many building blocks to build a very deep network for pose estimation. We call it the ResNet. It achieves better results on the VGG network. And we will investigate more variants of ResNet to better fit the pose estimation problem.
  41. In literature, the graph structure for modeling the relationships among body parts is usually designed. manually [60, 5]. However, no theoretical analysis shows how to build the connections among body parts, or which graph structure is optimal. Some efforts have been made on learning graph structures [55] from data. However, the graph structure is fixed once it has been learned and lacks flexibility to handle large variations. As mentioned before, previous work use convolutional kernels to learn the geometric relationships between parts. This process can be formulated by this equation. It approximates message passing from one score map to another score map by using a convolution layer, as illustrated in the figure.
  42. In previous work, this kind of convolution layer is either fully connected, or connected by hand crafted graph structures, and lacks flexibility to handle large variations. We propose to adjust the graph structure according to the image by incorporating gates to control the message passing.