SlideShare a Scribd company logo
1 of 27
Download to read offline
Bayesian Nonparametric Motor-skill Representations
for Efficient Learning of Robotic Clothing Assistance
Workshop on Practical Bayesian Nonparametrics, NIPS 2016
Nishanth Koganti1,2
, Tomoya Tamei1
, Kazushi Ikeda1
, Tomohiro Shibata2
1
Nara Institute of Science and Technology, Ikoma, Japan
2
Kyushu Institute of Technology, Kitakyushu, Japan
February 11, 2017
0 / 15
Robotic Clothing Assistance
Aging causes loss of motor functions to perform dextrous tasks.
Goal: Develop learning framework for humanoid robots to
perform clothing assistance.
Challenge: Close interaction of robot with clothes and human
Non-rigid clothing material 1
Varying posture of human 1
1
Figure Left: Ramisa et al., 2011, Right: Dan MacLeod Posture Study
1 / 15
Reinforcement Learning for Clothing Assistance
Markov Decision Process (MDP)
formulated with low-dimensional state,
policy representations. 1
1
Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011
2 / 15
Clothing Assistance Framework 1
: Outline
1
Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011
2 / 15
Clothing Assistance Framework 1
: Policy
Control policy parametrized by Via-points 2
of trajectory.
Finite difference policy gradient method is used for policy update:
∂η(θ)
∂θ
≈
r(θi + ∆θ) − r(θi − ∆θ)
2∆θ
θ ← θ + α
∂η(θ)
∂θ
1
Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011
2
Wada, Y. et al. “Theory for handwriting on minimization principle.” in Biological Cybernetics, 1995
3 / 15
Problem: Adaptive Learning of Clothing Skills
Design of robust motor-skills learning framework is crucial for
real-world implementation on low-cost robots.
Tight coupling with cloth and close proximity to Human.
Optimal policy varies with initial conditions.
Non-rigid clothing material Varying posture of human
1
Figure Left: Ramisa et al., 2011, Right: Dan MacLeod Posture Study
4 / 15
Reinforcement Learning in Latent Space
Combining motor-skills learning with dimensionality reduction:
Tractable search space reducing learning time.
Latent space can be modeled to capture task space constraints.
Existing methods rely on linear models or MAP estimate of
latent space.
Bitzer et al., 2010 1
Luck et al., 2014 2
1
Bitzer, S. et al., “Using dimensionality reduction in reinforcement learning” in IEEE/RSJ IROS, 2010
2
Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014
5 / 15
Motor-skill Learning in Latent Spaces
Use Bayesian nonparametric nonlinear dimensionality reduction for
efficient learning of clothing skills 1.
1
Nishanth, K. et al., “Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Clothing
Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016
6 / 15
Bayesian Gaussian Process Latent Variable Model
Latent variable model (Titsias et al., 2010 1):
y = f (x) + , ∈ N(0, σ2
I)
y ∈ RD
: Observed Variable
x ∈ RQ
(Q D): Unknown latent variable
f : x → y: Mapping given by Gaussian Process
p(Y|X) =
D
d=1
N(yd |0, KNN + β−1
IN)
x f
w, θ
y
1
Titsias, M. K. et al., “Bayesian Gaussian Process Latent Variable Model”, in AISTATS 2011
7 / 15
BGPLVM: Manifold Learning
Bayesian Inference: Posterior distribution on the latent
space.
p(Y) =
X
p(Y|X)p(X)dX
Marginalization made tractable using variational inference:
q(X) =
N
n=1
N(xn|µn, Sn)
log(p(Y)) ≥ q(X)p(Y|X)dX − q(X) log
q(X)
p(X)
dX
Automatic dimensionality reduction possible using ARD kernel:
k(x, x ) = σ2
f exp

−
1
2
Q
q=1
wq(xq − xq)2


1
Titsias, M. K. et al., “Bayesian Gaussian Process Latent Variable Model”, in AISTATS 2011
8 / 15
Motor-skills Transfer through Latent Space
BGPLVM model trained on robot joint angles ∈ R14
for kinesthetic
demonstration of clothing assistance 1.
1
Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clothing Assistance” in RSJ Annual
Conference, 2016
9 / 15
Reinforcement Learning in BGPLVM Space
Apply Cross Entropy Method to perform policy improvement:
θ∗
∼ N(θ|µ∗
, Σ∗
)
µ∗
:= mean(argmax θold), Σ∗
:= var(argmax θold)
Represent policy using Dynamic Movement Primitive (DMP):
τ¨x = K(g − x) − D ˙x + (g − x0)f
f (s) = i wi ψi (s)s
i ψi (s)
, where τ ˙s = −αs
1
Nishanth, K. et al., “Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Clothing
Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016 10 / 15
Reinforcement Learning in BGPLVM Space
Represent reward function by distance from desired Via-points
of current policy:
R(π(θ)) =
ndims
i=1
nvia
j=1
Vi,j − πi (θ, ti,j) 2
11 / 15
Latent Space Controller for Clothing Tasks 1
1
Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clothing Assistance” in RSJ Annual
Conference, 2016
12 / 15
Generalization in Latent Space
Evaluation: Reconstruction error
of latent space with RMS Error 1.
Dataset: Clothing trajectories
for 4 postures: Shoulder Angle
∈ {65o
, 70o
, 75o
, 80o
}.
PCA GPLVM BGPLVM
1
Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clothing Assistance” in RSJ Annual
Conference, 2016
13 / 15
Reinforcement Learning in Latent Space
Apply Reinforcement Learning in different action spaces with same
formulation and reward function
Parameters: 50 × ndims
basis functions
CEM: 50 rollouts per
iteration.
Policy Update: 5 best
rollouts per iteration
1
Nishanth, K. et al., “Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Clothing
Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016
14 / 15
Moving forward
Immediate Goal: Latent spaces for Robotics applications:
Auto-regressive prior on latent space to capture task dynamics.
Explicit model of human-robot interaction as constraint.
Ambitious Goal: Combine policy search RL and BGPLVM:
Non-linear dimensionality reduction.
Bayesian and data-efficient learning.
Data-efficient 1
Bayesian Inference 1
1
Deisenroth, M. P. et al., “Gaussian processes for data-efficient learning in robotics and control” in IEEE
Transactions PAMI, 2015
15 / 15
Appendix
15 / 15
Topology Coordinates
To approximate Markov Decision Process, the relationship between
cloth and subject needs to be observed as much as possible.
Low dimensional representations need to be used for a fast learning
time.
Topological Coordinates introduced to address both requirements.
Concept proposed by Edmond et. al(2009) 1
.
Given 2 line segments, the amount of twist(writhe) between them is
given by the Guassian Linking Integral(GLI):
w = GLI(γ1, γ2) =
1
4π γ1 γ2
dγ1 × dγ2 · (γ1 − γ2)
γ1 − γ2
3 (1)
1
Motion Synthesis using Topology Coordinates, Edmond et. al., Eurographics 2009
15 / 15
Topology Space
The relationship between linesegments is defined by the Writhe
matrix(Tn×m).
Given line segments S1, S2 with n,m links, Tn×m is given by:
Tij = GLI(Si
1, Sj
2)
The parameters writhe, center, density are defined from writhe
matrix which form the Topology Space.
1
Motion Synthesis using Topology Coordinates, Edmond et. al., Eurographics 2009
15 / 15
Clothing Assistance Framework 1
: State and Reward
Low-dimensional representation using Topology Coordinates 2
.
Reward given by distance between final state and target state:
ri = − starget
i − si (i = 1, 2, 3), r(s) =
3
i=1
ri − µi
σi
1
Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011
2
Ho, E. S., et al., “Character synthesis by topology coordinates”, in Computer Graphics Forum 2009
15 / 15
Combining DR and RL
Policy representation:
a = W(ZT
Φ) + MΦ + EΦ
Expectation Step: Posterior distribution over Latent Variables
pθold
(ZT
Φ|a) = N(CWT
(a − MΦ), Cσ2
tr(ΦΦT
)),
C = (σ2
I + WT
W)
Maximization: Compute gradients with respect to Policy
parameters
∂lnp(a)Qt
π
∂M
,
∂lnp(a)Qt
π
∂W
,
∂lnp(a)Qt
π
∂σ2
1
Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014
15 / 15
DR as Preprocessing for RL
Bitzer et al. (2010) 1: GPLVM based latent space encoding
task space constraints.
Non-linear dimensionality reduction
Data-efficient learning with GP-mapping
Value-function reinforcement learning (TD(0)) applied to
tractable search space.
1
Bitzer, S. et al., “Using dimensionality reduction in reinforcement learning” in IEEE/RSJ IROS, 2010
15 / 15
Combining DR and RL
Luck et al. (2014) 1: Joint learning of latent space and
optimal policy.
a = W(ZT
Φ) + MΦ + EΦ (2)
PePPER: Formulated Expectation-Maximization formulation
based on KL-divergence lower bound.
Probabilistic PCA used as model for learning latent space.
1
Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014
15 / 15
Combining DR and RL
Inverse Kinematics: Planning in joint angle space of highly
redundant robot (20 DOF).
Standing on one leg: Applied to full-humanoid robot and
policy learned from scratch.
1
Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014
15 / 15
Discussion
Robotic Clothing Assistance involves several problems.
Propose use of DR with RL for efficient motor-skills learning.
Future Work
Implement Latent Space RL framework for Clothing
Assistance framework.
Combine real-time state estimation with motor-skills learning
framework.
15 / 15
References
Tamei, Tomoya, et al. “Reinforcement learning of clothing assistance with a
dual-arm robot.” Humanoid Robots (Humanoids), 2011 11th IEEE-RAS
International Conference on. IEEE, 2011.
Ho, Edmond SL, and Taku Komura. “Character motion synthesis by topology
coordinates.” Computer Graphics Forum. Vol. 28. No. 2. Blackwell Publishing
Ltd, 2009.
Pohl, William F. “The self-linking number of a closed space curve(Gauss integral
formula treated for disjoint closed space curves linking number).” Journal of
Mathematics and Mechanics 17 (1968): 975-985.
Miyamoto, Hiroyuki, et al. “A kendama learning robot based on bi-directional
theory.” Neural networks 9.8 (1996): 1281-1302.
Koganti, Nishanth, et al. “Cloth dynamics modeling in latent spaces and its
application to robotic clothing assistance.” Intelligent Robots and Systems
(IROS), 2015 IEEE/RSJ International Conference on. IEEE, 2015.
Deisenroth, Marc Peter, Dieter Fox, and Carl Edward Rasmussen. “Gaussian
processes for data-efficient learning in robotics and control.” Pattern Analysis
and Machine Intelligence, IEEE Transactions on 37.2 (2015): 408-423.
Levine, Sergey, et al. “End-to-end training of deep visuomotor policies.” arXiv
preprint arXiv:1504.00702 (2015).
15 / 15

More Related Content

What's hot

Symbolic representation and recognition of gait an approach based on lbp of ...
Symbolic representation and recognition of gait  an approach based on lbp of ...Symbolic representation and recognition of gait  an approach based on lbp of ...
Symbolic representation and recognition of gait an approach based on lbp of ...sipij
 
Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...哲东 郑
 
Semantic Filtering (An Image Processing Method)
Semantic Filtering (An Image Processing Method)Semantic Filtering (An Image Processing Method)
Semantic Filtering (An Image Processing Method)Seval Çapraz
 
Lec10: Medical Image Segmentation as an Energy Minimization Problem
Lec10: Medical Image Segmentation as an Energy Minimization ProblemLec10: Medical Image Segmentation as an Energy Minimization Problem
Lec10: Medical Image Segmentation as an Energy Minimization ProblemUlaş Bağcı
 
On Training Targets and Objective Functions for Deep-Learning-Based Audio-Vis...
On Training Targets and Objective Functions for Deep-Learning-Based Audio-Vis...On Training Targets and Objective Functions for Deep-Learning-Based Audio-Vis...
On Training Targets and Objective Functions for Deep-Learning-Based Audio-Vis...Daniel Michelsanti
 
NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2zukun
 
Improving Performance of Back propagation Learning Algorithm
Improving Performance of Back propagation Learning AlgorithmImproving Performance of Back propagation Learning Algorithm
Improving Performance of Back propagation Learning Algorithmijsrd.com
 
Fast Unbalanced Optimal Transport on a Tree
Fast Unbalanced Optimal Transport on a TreeFast Unbalanced Optimal Transport on a Tree
Fast Unbalanced Optimal Transport on a Treejoisino
 
Human Action Recognition Based on Spacio-temporal features
Human Action Recognition Based on Spacio-temporal featuresHuman Action Recognition Based on Spacio-temporal features
Human Action Recognition Based on Spacio-temporal featuresnikhilus85
 
SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES
SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES
SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES Toru Tamaki
 
Vocabulary length experiments for binary image classification using bov approach
Vocabulary length experiments for binary image classification using bov approachVocabulary length experiments for binary image classification using bov approach
Vocabulary length experiments for binary image classification using bov approachsipij
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONcsandit
 
201109CVIM/PRMU Inverse Composite Alignment of a sphere under orthogonal proj...
201109CVIM/PRMU Inverse Composite Alignment of a sphere under orthogonal proj...201109CVIM/PRMU Inverse Composite Alignment of a sphere under orthogonal proj...
201109CVIM/PRMU Inverse Composite Alignment of a sphere under orthogonal proj...Toru Tamaki
 
20110326 CG・CVにおける散乱
20110326 CG・CVにおける散乱20110326 CG・CVにおける散乱
20110326 CG・CVにおける散乱Toru Tamaki
 
3次元レジストレーションの基礎とOpen3Dを用いた3次元点群処理
3次元レジストレーションの基礎とOpen3Dを用いた3次元点群処理3次元レジストレーションの基礎とOpen3Dを用いた3次元点群処理
3次元レジストレーションの基礎とOpen3Dを用いた3次元点群処理Toru Tamaki
 
Deep Learning - What's the buzz all about
Deep Learning - What's the buzz all aboutDeep Learning - What's the buzz all about
Deep Learning - What's the buzz all aboutDebdoot Sheet
 
Human Action Recognition Based on Spacio-temporal features-Poster
Human Action Recognition Based on Spacio-temporal features-PosterHuman Action Recognition Based on Spacio-temporal features-Poster
Human Action Recognition Based on Spacio-temporal features-Posternikhilus85
 

What's hot (19)

Symbolic representation and recognition of gait an approach based on lbp of ...
Symbolic representation and recognition of gait  an approach based on lbp of ...Symbolic representation and recognition of gait  an approach based on lbp of ...
Symbolic representation and recognition of gait an approach based on lbp of ...
 
Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...
 
Semantic Filtering (An Image Processing Method)
Semantic Filtering (An Image Processing Method)Semantic Filtering (An Image Processing Method)
Semantic Filtering (An Image Processing Method)
 
Lec10: Medical Image Segmentation as an Energy Minimization Problem
Lec10: Medical Image Segmentation as an Energy Minimization ProblemLec10: Medical Image Segmentation as an Energy Minimization Problem
Lec10: Medical Image Segmentation as an Energy Minimization Problem
 
On Training Targets and Objective Functions for Deep-Learning-Based Audio-Vis...
On Training Targets and Objective Functions for Deep-Learning-Based Audio-Vis...On Training Targets and Objective Functions for Deep-Learning-Based Audio-Vis...
On Training Targets and Objective Functions for Deep-Learning-Based Audio-Vis...
 
NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2
 
Improving Performance of Back propagation Learning Algorithm
Improving Performance of Back propagation Learning AlgorithmImproving Performance of Back propagation Learning Algorithm
Improving Performance of Back propagation Learning Algorithm
 
Fast Unbalanced Optimal Transport on a Tree
Fast Unbalanced Optimal Transport on a TreeFast Unbalanced Optimal Transport on a Tree
Fast Unbalanced Optimal Transport on a Tree
 
Lecture15 xing
Lecture15 xingLecture15 xing
Lecture15 xing
 
Human Action Recognition Based on Spacio-temporal features
Human Action Recognition Based on Spacio-temporal featuresHuman Action Recognition Based on Spacio-temporal features
Human Action Recognition Based on Spacio-temporal features
 
B010430814
B010430814B010430814
B010430814
 
SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES
SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES
SCALE RATIO ICP FOR 3D POINT CLOUDS WITH DIFFERENT SCALES
 
Vocabulary length experiments for binary image classification using bov approach
Vocabulary length experiments for binary image classification using bov approachVocabulary length experiments for binary image classification using bov approach
Vocabulary length experiments for binary image classification using bov approach
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
 
201109CVIM/PRMU Inverse Composite Alignment of a sphere under orthogonal proj...
201109CVIM/PRMU Inverse Composite Alignment of a sphere under orthogonal proj...201109CVIM/PRMU Inverse Composite Alignment of a sphere under orthogonal proj...
201109CVIM/PRMU Inverse Composite Alignment of a sphere under orthogonal proj...
 
20110326 CG・CVにおける散乱
20110326 CG・CVにおける散乱20110326 CG・CVにおける散乱
20110326 CG・CVにおける散乱
 
3次元レジストレーションの基礎とOpen3Dを用いた3次元点群処理
3次元レジストレーションの基礎とOpen3Dを用いた3次元点群処理3次元レジストレーションの基礎とOpen3Dを用いた3次元点群処理
3次元レジストレーションの基礎とOpen3Dを用いた3次元点群処理
 
Deep Learning - What's the buzz all about
Deep Learning - What's the buzz all aboutDeep Learning - What's the buzz all about
Deep Learning - What's the buzz all about
 
Human Action Recognition Based on Spacio-temporal features-Poster
Human Action Recognition Based on Spacio-temporal features-PosterHuman Action Recognition Based on Spacio-temporal features-Poster
Human Action Recognition Based on Spacio-temporal features-Poster
 

Similar to Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Robotic Clothing Assistance

Improved Particle Swarm Optimization
Improved Particle Swarm OptimizationImproved Particle Swarm Optimization
Improved Particle Swarm Optimizationvane sanchez
 
Web image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmWeb image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmijfcstjournal
 
Comparison Between PSO and HPSO In Image Steganography
Comparison Between PSO and HPSO In Image SteganographyComparison Between PSO and HPSO In Image Steganography
Comparison Between PSO and HPSO In Image SteganographyIJCSIS Research Publications
 
INTEGRATION OF GIS AND OPTIMIZATION ROUTINES FOR THE VEHICLE ROUTING PROBLEM
INTEGRATION OF GIS AND OPTIMIZATION ROUTINES FOR THE VEHICLE ROUTING PROBLEMINTEGRATION OF GIS AND OPTIMIZATION ROUTINES FOR THE VEHICLE ROUTING PROBLEM
INTEGRATION OF GIS AND OPTIMIZATION ROUTINES FOR THE VEHICLE ROUTING PROBLEMijccmsjournal
 
Integration Of Gis And Optimization Routines For The Vehicle Routing Problem
Integration Of Gis And Optimization Routines For The Vehicle Routing ProblemIntegration Of Gis And Optimization Routines For The Vehicle Routing Problem
Integration Of Gis And Optimization Routines For The Vehicle Routing Problemijccmsjournal
 
4 tracking objects of deformable shapes (1)
4 tracking objects of deformable shapes (1)4 tracking objects of deformable shapes (1)
4 tracking objects of deformable shapes (1)prj_publication
 
4 tracking objects of deformable shapes
4 tracking objects of deformable shapes4 tracking objects of deformable shapes
4 tracking objects of deformable shapesprj_publication
 
4 tracking objects of deformable shapes
4 tracking objects of deformable shapes4 tracking objects of deformable shapes
4 tracking objects of deformable shapesprj_publication
 
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach IJECEIAES
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Riccardo Satta
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONcscpconf
 
4 tracking objects of deformable shapes
4 tracking objects of deformable shapes4 tracking objects of deformable shapes
4 tracking objects of deformable shapesprjpublications
 
From Signal to Symbols
From Signal to SymbolsFrom Signal to Symbols
From Signal to Symbolsgpano
 
[IROS2017] Online Spatial Concept and Lexical Acquisition with Simultaneous L...
[IROS2017] Online Spatial Concept and Lexical Acquisition with Simultaneous L...[IROS2017] Online Spatial Concept and Lexical Acquisition with Simultaneous L...
[IROS2017] Online Spatial Concept and Lexical Acquisition with Simultaneous L...Akira Taniguchi
 

Similar to Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Robotic Clothing Assistance (20)

Improved Particle Swarm Optimization
Improved Particle Swarm OptimizationImproved Particle Swarm Optimization
Improved Particle Swarm Optimization
 
Presentation v3.2
Presentation v3.2Presentation v3.2
Presentation v3.2
 
Presentation v3.2
Presentation v3.2Presentation v3.2
Presentation v3.2
 
Web image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmWeb image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithm
 
Comparison Between PSO and HPSO In Image Steganography
Comparison Between PSO and HPSO In Image SteganographyComparison Between PSO and HPSO In Image Steganography
Comparison Between PSO and HPSO In Image Steganography
 
INTEGRATION OF GIS AND OPTIMIZATION ROUTINES FOR THE VEHICLE ROUTING PROBLEM
INTEGRATION OF GIS AND OPTIMIZATION ROUTINES FOR THE VEHICLE ROUTING PROBLEMINTEGRATION OF GIS AND OPTIMIZATION ROUTINES FOR THE VEHICLE ROUTING PROBLEM
INTEGRATION OF GIS AND OPTIMIZATION ROUTINES FOR THE VEHICLE ROUTING PROBLEM
 
Integration Of Gis And Optimization Routines For The Vehicle Routing Problem
Integration Of Gis And Optimization Routines For The Vehicle Routing ProblemIntegration Of Gis And Optimization Routines For The Vehicle Routing Problem
Integration Of Gis And Optimization Routines For The Vehicle Routing Problem
 
2213ijccms02.pdf
2213ijccms02.pdf2213ijccms02.pdf
2213ijccms02.pdf
 
Application of transportation problem under pentagonal neutrosophic environment
Application of transportation problem under pentagonal neutrosophic environmentApplication of transportation problem under pentagonal neutrosophic environment
Application of transportation problem under pentagonal neutrosophic environment
 
4 tracking objects of deformable shapes (1)
4 tracking objects of deformable shapes (1)4 tracking objects of deformable shapes (1)
4 tracking objects of deformable shapes (1)
 
4 tracking objects of deformable shapes
4 tracking objects of deformable shapes4 tracking objects of deformable shapes
4 tracking objects of deformable shapes
 
4 tracking objects of deformable shapes
4 tracking objects of deformable shapes4 tracking objects of deformable shapes
4 tracking objects of deformable shapes
 
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
 
4 tracking objects of deformable shapes
4 tracking objects of deformable shapes4 tracking objects of deformable shapes
4 tracking objects of deformable shapes
 
From Signal to Symbols
From Signal to SymbolsFrom Signal to Symbols
From Signal to Symbols
 
I04105358
I04105358I04105358
I04105358
 
[IROS2017] Online Spatial Concept and Lexical Acquisition with Simultaneous L...
[IROS2017] Online Spatial Concept and Lexical Acquisition with Simultaneous L...[IROS2017] Online Spatial Concept and Lexical Acquisition with Simultaneous L...
[IROS2017] Online Spatial Concept and Lexical Acquisition with Simultaneous L...
 
K010218188
K010218188K010218188
K010218188
 

Recently uploaded

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 

Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Robotic Clothing Assistance

  • 1. Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Robotic Clothing Assistance Workshop on Practical Bayesian Nonparametrics, NIPS 2016 Nishanth Koganti1,2 , Tomoya Tamei1 , Kazushi Ikeda1 , Tomohiro Shibata2 1 Nara Institute of Science and Technology, Ikoma, Japan 2 Kyushu Institute of Technology, Kitakyushu, Japan February 11, 2017 0 / 15
  • 2. Robotic Clothing Assistance Aging causes loss of motor functions to perform dextrous tasks. Goal: Develop learning framework for humanoid robots to perform clothing assistance. Challenge: Close interaction of robot with clothes and human Non-rigid clothing material 1 Varying posture of human 1 1 Figure Left: Ramisa et al., 2011, Right: Dan MacLeod Posture Study 1 / 15
  • 3. Reinforcement Learning for Clothing Assistance Markov Decision Process (MDP) formulated with low-dimensional state, policy representations. 1 1 Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011 2 / 15
  • 4. Clothing Assistance Framework 1 : Outline 1 Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011 2 / 15
  • 5. Clothing Assistance Framework 1 : Policy Control policy parametrized by Via-points 2 of trajectory. Finite difference policy gradient method is used for policy update: ∂η(θ) ∂θ ≈ r(θi + ∆θ) − r(θi − ∆θ) 2∆θ θ ← θ + α ∂η(θ) ∂θ 1 Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011 2 Wada, Y. et al. “Theory for handwriting on minimization principle.” in Biological Cybernetics, 1995 3 / 15
  • 6. Problem: Adaptive Learning of Clothing Skills Design of robust motor-skills learning framework is crucial for real-world implementation on low-cost robots. Tight coupling with cloth and close proximity to Human. Optimal policy varies with initial conditions. Non-rigid clothing material Varying posture of human 1 Figure Left: Ramisa et al., 2011, Right: Dan MacLeod Posture Study 4 / 15
  • 7. Reinforcement Learning in Latent Space Combining motor-skills learning with dimensionality reduction: Tractable search space reducing learning time. Latent space can be modeled to capture task space constraints. Existing methods rely on linear models or MAP estimate of latent space. Bitzer et al., 2010 1 Luck et al., 2014 2 1 Bitzer, S. et al., “Using dimensionality reduction in reinforcement learning” in IEEE/RSJ IROS, 2010 2 Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014 5 / 15
  • 8. Motor-skill Learning in Latent Spaces Use Bayesian nonparametric nonlinear dimensionality reduction for efficient learning of clothing skills 1. 1 Nishanth, K. et al., “Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Clothing Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016 6 / 15
  • 9. Bayesian Gaussian Process Latent Variable Model Latent variable model (Titsias et al., 2010 1): y = f (x) + , ∈ N(0, σ2 I) y ∈ RD : Observed Variable x ∈ RQ (Q D): Unknown latent variable f : x → y: Mapping given by Gaussian Process p(Y|X) = D d=1 N(yd |0, KNN + β−1 IN) x f w, θ y 1 Titsias, M. K. et al., “Bayesian Gaussian Process Latent Variable Model”, in AISTATS 2011 7 / 15
  • 10. BGPLVM: Manifold Learning Bayesian Inference: Posterior distribution on the latent space. p(Y) = X p(Y|X)p(X)dX Marginalization made tractable using variational inference: q(X) = N n=1 N(xn|µn, Sn) log(p(Y)) ≥ q(X)p(Y|X)dX − q(X) log q(X) p(X) dX Automatic dimensionality reduction possible using ARD kernel: k(x, x ) = σ2 f exp  − 1 2 Q q=1 wq(xq − xq)2   1 Titsias, M. K. et al., “Bayesian Gaussian Process Latent Variable Model”, in AISTATS 2011 8 / 15
  • 11. Motor-skills Transfer through Latent Space BGPLVM model trained on robot joint angles ∈ R14 for kinesthetic demonstration of clothing assistance 1. 1 Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clothing Assistance” in RSJ Annual Conference, 2016 9 / 15
  • 12. Reinforcement Learning in BGPLVM Space Apply Cross Entropy Method to perform policy improvement: θ∗ ∼ N(θ|µ∗ , Σ∗ ) µ∗ := mean(argmax θold), Σ∗ := var(argmax θold) Represent policy using Dynamic Movement Primitive (DMP): τ¨x = K(g − x) − D ˙x + (g − x0)f f (s) = i wi ψi (s)s i ψi (s) , where τ ˙s = −αs 1 Nishanth, K. et al., “Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Clothing Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016 10 / 15
  • 13. Reinforcement Learning in BGPLVM Space Represent reward function by distance from desired Via-points of current policy: R(π(θ)) = ndims i=1 nvia j=1 Vi,j − πi (θ, ti,j) 2 11 / 15
  • 14. Latent Space Controller for Clothing Tasks 1 1 Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clothing Assistance” in RSJ Annual Conference, 2016 12 / 15
  • 15. Generalization in Latent Space Evaluation: Reconstruction error of latent space with RMS Error 1. Dataset: Clothing trajectories for 4 postures: Shoulder Angle ∈ {65o , 70o , 75o , 80o }. PCA GPLVM BGPLVM 1 Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clothing Assistance” in RSJ Annual Conference, 2016 13 / 15
  • 16. Reinforcement Learning in Latent Space Apply Reinforcement Learning in different action spaces with same formulation and reward function Parameters: 50 × ndims basis functions CEM: 50 rollouts per iteration. Policy Update: 5 best rollouts per iteration 1 Nishanth, K. et al., “Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Clothing Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016 14 / 15
  • 17. Moving forward Immediate Goal: Latent spaces for Robotics applications: Auto-regressive prior on latent space to capture task dynamics. Explicit model of human-robot interaction as constraint. Ambitious Goal: Combine policy search RL and BGPLVM: Non-linear dimensionality reduction. Bayesian and data-efficient learning. Data-efficient 1 Bayesian Inference 1 1 Deisenroth, M. P. et al., “Gaussian processes for data-efficient learning in robotics and control” in IEEE Transactions PAMI, 2015 15 / 15
  • 19. Topology Coordinates To approximate Markov Decision Process, the relationship between cloth and subject needs to be observed as much as possible. Low dimensional representations need to be used for a fast learning time. Topological Coordinates introduced to address both requirements. Concept proposed by Edmond et. al(2009) 1 . Given 2 line segments, the amount of twist(writhe) between them is given by the Guassian Linking Integral(GLI): w = GLI(γ1, γ2) = 1 4π γ1 γ2 dγ1 × dγ2 · (γ1 − γ2) γ1 − γ2 3 (1) 1 Motion Synthesis using Topology Coordinates, Edmond et. al., Eurographics 2009 15 / 15
  • 20. Topology Space The relationship between linesegments is defined by the Writhe matrix(Tn×m). Given line segments S1, S2 with n,m links, Tn×m is given by: Tij = GLI(Si 1, Sj 2) The parameters writhe, center, density are defined from writhe matrix which form the Topology Space. 1 Motion Synthesis using Topology Coordinates, Edmond et. al., Eurographics 2009 15 / 15
  • 21. Clothing Assistance Framework 1 : State and Reward Low-dimensional representation using Topology Coordinates 2 . Reward given by distance between final state and target state: ri = − starget i − si (i = 1, 2, 3), r(s) = 3 i=1 ri − µi σi 1 Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011 2 Ho, E. S., et al., “Character synthesis by topology coordinates”, in Computer Graphics Forum 2009 15 / 15
  • 22. Combining DR and RL Policy representation: a = W(ZT Φ) + MΦ + EΦ Expectation Step: Posterior distribution over Latent Variables pθold (ZT Φ|a) = N(CWT (a − MΦ), Cσ2 tr(ΦΦT )), C = (σ2 I + WT W) Maximization: Compute gradients with respect to Policy parameters ∂lnp(a)Qt π ∂M , ∂lnp(a)Qt π ∂W , ∂lnp(a)Qt π ∂σ2 1 Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014 15 / 15
  • 23. DR as Preprocessing for RL Bitzer et al. (2010) 1: GPLVM based latent space encoding task space constraints. Non-linear dimensionality reduction Data-efficient learning with GP-mapping Value-function reinforcement learning (TD(0)) applied to tractable search space. 1 Bitzer, S. et al., “Using dimensionality reduction in reinforcement learning” in IEEE/RSJ IROS, 2010 15 / 15
  • 24. Combining DR and RL Luck et al. (2014) 1: Joint learning of latent space and optimal policy. a = W(ZT Φ) + MΦ + EΦ (2) PePPER: Formulated Expectation-Maximization formulation based on KL-divergence lower bound. Probabilistic PCA used as model for learning latent space. 1 Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014 15 / 15
  • 25. Combining DR and RL Inverse Kinematics: Planning in joint angle space of highly redundant robot (20 DOF). Standing on one leg: Applied to full-humanoid robot and policy learned from scratch. 1 Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014 15 / 15
  • 26. Discussion Robotic Clothing Assistance involves several problems. Propose use of DR with RL for efficient motor-skills learning. Future Work Implement Latent Space RL framework for Clothing Assistance framework. Combine real-time state estimation with motor-skills learning framework. 15 / 15
  • 27. References Tamei, Tomoya, et al. “Reinforcement learning of clothing assistance with a dual-arm robot.” Humanoid Robots (Humanoids), 2011 11th IEEE-RAS International Conference on. IEEE, 2011. Ho, Edmond SL, and Taku Komura. “Character motion synthesis by topology coordinates.” Computer Graphics Forum. Vol. 28. No. 2. Blackwell Publishing Ltd, 2009. Pohl, William F. “The self-linking number of a closed space curve(Gauss integral formula treated for disjoint closed space curves linking number).” Journal of Mathematics and Mechanics 17 (1968): 975-985. Miyamoto, Hiroyuki, et al. “A kendama learning robot based on bi-directional theory.” Neural networks 9.8 (1996): 1281-1302. Koganti, Nishanth, et al. “Cloth dynamics modeling in latent spaces and its application to robotic clothing assistance.” Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 2015. Deisenroth, Marc Peter, Dieter Fox, and Carl Edward Rasmussen. “Gaussian processes for data-efficient learning in robotics and control.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 37.2 (2015): 408-423. Levine, Sergey, et al. “End-to-end training of deep visuomotor policies.” arXiv preprint arXiv:1504.00702 (2015). 15 / 15