SlideShare a Scribd company logo
1 of 52
Download to read offline
Beating the Perils of Non-Convexity:
Machine Learning using Tensor Methods
Anima Anandkumar
U.C. Irvine
Learning with Big Data
Learning is finding needle in a haystack
Learning with Big Data
Learning is finding needle in a haystack
High dimensional regime: as data grows, more variables!
Useful information: low-dimensional structures.
Learning with big data: ill-posed problem.
Learning with Big Data
Learning is finding needle in a haystack
High dimensional regime: as data grows, more variables!
Useful information: low-dimensional structures.
Learning with big data: ill-posed problem.
Learning with big data: statistically and computationally challenging!
Optimization for Learning
Most learning problems can be cast as optimization.
Unsupervised Learning
Clustering
k-means, hierarchical . . .
Maximum Likelihood Estimator
Probabilistic latent variable models
Supervised Learning
Optimizing a neural network with
respect to a loss function Input
Neuron
Output
Convex vs. Non-convex Optimization
Progress is only tip of the iceberg..
Images taken from https://www.facebook.com/nonconvex
Convex vs. Non-convex Optimization
Progress is only tip of the iceberg.. Real world is mostly non-convex!
Images taken from https://www.facebook.com/nonconvex
Convex vs. Nonconvex Optimization
Unique optimum: global/local. Multiple local optima
Convex vs. Nonconvex Optimization
Unique optimum: global/local. Multiple local optima
In high dimensions possibly
exponential local optima
Convex vs. Nonconvex Optimization
Unique optimum: global/local. Multiple local optima
In high dimensions possibly
exponential local optima
How to deal with non-convexity?
Outline
1 Introduction
2 Spectral Methods: Matrices and Tensors
3 Applications of Tensor Methods
4 Conclusion
Matrices and Tensors as Data Structures
Modern data: Multi-modal,
multi-relational data.
Matrices: pairwise relations. Tensors: higher order relations.
Multi-modal data figure from Lise Getoor slides.
Tensor Representation of Higher Order Moments
Matrix: Pairwise Moments
E[x ⊗ x] ∈ Rd×d is a second order tensor.
E[x ⊗ x]i1,i2 = E[xi1 xi2 ].
For matrices: E[x ⊗ x] = E[xx⊤].
Tensor: Higher order Moments
E[x ⊗ x ⊗ x] ∈ Rd×d×d is a third order tensor.
E[x ⊗ x ⊗ x]i1,i2,i3 = E[xi1 xi2 xi3 ].
Spectral Decomposition of Tensors
M2 =
i
λiui ⊗ vi
= + ....
Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2
Spectral Decomposition of Tensors
M2 =
i
λiui ⊗ vi
= + ....
Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2
M3 =
i
λiui ⊗ vi ⊗ wi
= + ....
Tensor M3 λ1u1 ⊗ v1 ⊗ w1 λ2u2 ⊗ v2 ⊗ w2
We have developed efficient methods to solve tensor decomposition.
Tensor Decomposition Methods
SVD on Pairwise Moments to
obtain W
Fast SVD via Random
Projection
+ ++ +
Dimensionality Reduction of
third order moments using W
+ ++ +
Power method on reduced tensor T: v →
T(I, v, v)
T(I, v, v)
.
Guaranteed to converge to correct solution!
Summary on Tensor Methods
= + ....
Tensor decomposition: iterative algorithm with tensor contractions.
Tensor contraction: T(W1, W2, W3), where Wi are matrices/vectors.
Strengths of tensor method
Guaranteed convergence to globally optimal solution!
Fast and accurate, orders of magnitude faster than previous methods.
Embarrassingly parallel and suited for distributed systems, e.g.Spark.
Exploit optimized BLAS operations on CPUs and GPUs.
Preliminary Results on Spark
Alternating Least Squares for Tensor Decomposition.
min
w,A,B,C
T −
k
i=1
λiA(:, i) ⊗ B(:, i) ⊗ C(:, i)
2
F
Update Rows Independently
Tensor slices
B C
worker 1
worker 2
worker i
worker k
Results on NYtimes corpus
3 ∗ 105 documents, 108 words
Spark Map-Reduce
26mins 4 hrs
https://github.com/FurongHuang/SpectralLDA-TensorSpark
Outline
1 Introduction
2 Spectral Methods: Matrices and Tensors
3 Applications of Tensor Methods
4 Conclusion
Tensor Methods for Learning LVMs
GMM HMM
h1 h2 h3
x1 x2 x3
ICA
h1 h2 hk
x1 x2 xd
Multiview and Topic Models
Overview of Our Approach
Tensor-based Model Estimation
= + +
Unlabeled Data
Overview of Our Approach
Tensor-based Model Estimation
= + +
Unlabeled Data → Probabilistic admixture models
Overview of Our Approach
Tensor-based Model Estimation
= + +
Unlabeled Data → Probabilistic admixture models → Tensor Decomposition
Overview of Our Approach
Tensor-based Model Estimation
= + +
Unlabeled Data → Probabilistic admixture models → Tensor Decomposition → Inference
Overview of Our Approach
Tensor-based Model Estimation
= + +
Unlabeled Data → Probabilistic admixture models → Tensor Decomposition → Inference
tensor decomposition → correct model
Overview of Our Approach
Tensor-based Model Estimation
= + +
Unlabeled Data → Probabilistic admixture models → Tensor Decomposition → Inference
tensor decomposition → correct model
Highly parallelized, CPU/GPU/Cloud implementation.
Guaranteed global convergence for Tensor Decomposition.
Community Detection on Yelp and DBLP datasets
Facebook
n ∼ 20, 000
Yelp
n ∼ 40, 000
DBLP
n ∼ 1 million
Tensor vs. Variational
FB YP DBLP−sub DBLP
10
0
10
1
10
2
10
3
10
4
10
5
Datasets
Runningtime(seconds)
Variational
Tensor
Tensor Methods for Training Neural Networks
Tremendous practical impact
with deep learning.
Algorithm: backpropagation.
Highly non-convex optimization
Toy Example: Failure of Backpropagation
x1
x2
y=1y=−1
Labeled input samples
Goal: binary classification
σ(·) σ(·)
y
x1 x2x
w1 w2
Our method: guaranteed risk bounds for training neural networks
Toy Example: Failure of Backpropagation
x1
x2
y=1y=−1
Labeled input samples
Goal: binary classification
σ(·) σ(·)
y
x1 x2x
w1 w2
Our method: guaranteed risk bounds for training neural networks
Toy Example: Failure of Backpropagation
x1
x2
y=1y=−1
Labeled input samples
Goal: binary classification
σ(·) σ(·)
y
x1 x2x
w1 w2
Our method: guaranteed risk bounds for training neural networks
Backpropagation vs. Our Method
Weights w2 randomly drawn and fixed
Backprop (quadratic) loss surface
−4
−3
−2
−1
0
1
2
3
4 −4
−3
−2
−1
0
1
2
3
4
200
250
300
350
400
450
500
550
600
650
w1(1) w1(2)
Backpropagation vs. Our Method
Weights w2 randomly drawn and fixed
Backprop (quadratic) loss surface
−4
−3
−2
−1
0
1
2
3
4 −4
−3
−2
−1
0
1
2
3
4
200
250
300
350
400
450
500
550
600
650
w1(1) w1(2)
Loss surface for our method
−4
−3
−2
−1
0
1
2
3
4 −4
−3
−2
−1
0
1
2
3
4
0
20
40
60
80
100
120
140
160
180
200
w1(1) w1(2)
Feature Transformation for Training Neural Networks
Feature learning: Learn φ(·) from input data.
How to use φ(·) to train neural networks?
x
φ(x)
y
Feature Transformation for Training Neural Networks
Feature learning: Learn φ(·) from input data.
How to use φ(·) to train neural networks?
x
φ(x)
y
Invert E[φ(x) ⊗ y] to Learn Neural Network Parameters
In general, E[φ(x) ⊗ y] is a tensor.
What class of φ(·) useful for training neural networks?
What algorithms can invert moment tensors to obtain parameters?
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
Input:
x ∈ Rd
S1(x) ∈ Rd
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
Input:
x ∈ Rd
S1(x) ∈ Rd
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
Input:
x ∈ Rd
S1(x) ∈ Rd
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
mth
-order score function:
Input:
x ∈ Rd
S1(x) ∈ Rd
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
mth
-order score function:
Sm(x) := (−1)m ∇(m)p(x)
p(x)
Input:
x ∈ Rd
S1(x) ∈ Rd
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
mth
-order score function:
Sm(x) := (−1)m ∇(m)p(x)
p(x)
Input:
x ∈ Rd
S2(x) ∈ Rd×d
Score Function Transformations
Score function for x ∈ Rd
with pdf p(·):
S1(x) := −∇x log p(x)
mth
-order score function:
Sm(x) := (−1)m ∇(m)p(x)
p(x)
Input:
x ∈ Rd
S3(x) ∈ Rd×d×d
NN-LiFT: Neural Network LearnIng
using Feature Tensors
Input:
x ∈ Rd
Score function
S3(x) ∈ Rd×d×d
NN-LiFT: Neural Network LearnIng
using Feature Tensors
Input:
x ∈ Rd
Score function
S3(x) ∈ Rd×d×d
1
n
n
i=1
yi ⊗ S3(xi) =
1
n
n
i=1
yi⊗
Estimating M3 using
labeled data {(xi, yi)}
S3(xi)
Cross-
moment
NN-LiFT: Neural Network LearnIng
using Feature Tensors
Input:
x ∈ Rd
Score function
S3(x) ∈ Rd×d×d
1
n
n
i=1
yi ⊗ S3(xi) =
1
n
n
i=1
yi⊗
Estimating M3 using
labeled data {(xi, yi)}
S3(xi)
Cross-
moment
Rank-1 components are the estimates of first layer weights
+ + · · ·
CP tensor
decomposition
Reinforcement Learning (RL) of POMDPs
Partially observable Markov decision processes.
Proposed Method
Consider memoryless policies. Episodic learning: indirect exploration.
Tensor methods: careful conditioning required for learning.
First RL method for POMDPs with logarithmic regret bounds.
xi xi+1 xi+2
yi yi+1
ri ri+1
ai ai+1
0 1000 2000 3000 4000 5000 6000 7000
2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
Number of Trials
AverageReward
SM−UCRL−POMDP
UCRL−MDP
Q−learning
Random Policy
Logarithmic Regret Bounds for POMDPs using Spectral Methods by K. Azzizade, A. Lazaric, A.
, under preparation.
Convolutional Tensor Decomposition
== ∗
xx f∗
i w∗
i F∗ w∗
(a)Convolutional dictionary model (b)Reformulated model
.... ....= ++
Cumulant λ1(F∗
1 )⊗3 +λ2(F∗
2 )⊗3 . . .
Efficient methods for tensor decomposition with circulant constraints.
Convolutional Dictionary Learning through Tensor Factorization by F. Huang, A. , June 2015.
Hierarchical Tensor Decomposition
= + .... = + .... = + ....
= + ....
Relevant for learning latent tree graphical models.
Propose first algorithm for integrated learning of hierarchy and
components of tensor decomposition.
Highly parallelized operations but with global consistency guarantees.
Scalable Latent Tree Model and its Application to Health Analytics By F. Huang, U. N.
Niranjan, J. Perros, R. Chen, J. Sun, A. , Preprint, June 2014.
Outline
1 Introduction
2 Spectral Methods: Matrices and Tensors
3 Applications of Tensor Methods
4 Conclusion
Summary and Outlook
Summary
Tensor methods: a powerful paradigm for guaranteed large-scale
machine learning.
First methods to provide provable bounds for training neural
networks, many latent variable models (e.g HMM, LDA), POMDPs!
Summary and Outlook
Summary
Tensor methods: a powerful paradigm for guaranteed large-scale
machine learning.
First methods to provide provable bounds for training neural
networks, many latent variable models (e.g HMM, LDA), POMDPs!
Outlook
Training multi-layer neural networks, models with invariances,
reinforcement learning using neural networks . . .
Unified framework for tractable non-convex methods with guaranteed
convergence to global optima?
My Research Group and Resources
Podcast/lectures/papers/software available at
http://newport.eecs.uci.edu/anandkumar/

More Related Content

What's hot

Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMarjan Sterjev
 
K-Means Clustering Simply
K-Means Clustering SimplyK-Means Clustering Simply
K-Means Clustering SimplyEmad Nabil
 
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...CvilleDataScience
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture modelsVu Pham
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr taeseon ryu
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modeljins0618
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismParameswaran Raman
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Universitat Politècnica de Catalunya
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationMarjan Sterjev
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programmingShakil Ahmed
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programmingKrish_ver2
 
Perspective in Informatics 3 - Assignment 1 - Answer Sheet
Perspective in Informatics 3 - Assignment 1 - Answer SheetPerspective in Informatics 3 - Assignment 1 - Answer Sheet
Perspective in Informatics 3 - Assignment 1 - Answer SheetHoang Nguyen Phong
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Chris Fregly
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 
Skiena algorithm 2007 lecture16 introduction to dynamic programming
Skiena algorithm 2007 lecture16 introduction to dynamic programmingSkiena algorithm 2007 lecture16 introduction to dynamic programming
Skiena algorithm 2007 lecture16 introduction to dynamic programmingzukun
 
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetHoang Nguyen Phong
 

What's hot (20)

Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark Examples
 
K-Means Clustering Simply
K-Means Clustering SimplyK-Means Clustering Simply
K-Means Clustering Simply
 
Svm V SVC
Svm V SVCSvm V SVC
Svm V SVC
 
5.1 greedy 03
5.1 greedy 035.1 greedy 03
5.1 greedy 03
 
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr
 
Knapsack problem using fixed tuple
Knapsack problem using fixed tupleKnapsack problem using fixed tuple
Knapsack problem using fixed tuple
 
Daa unit 4
Daa unit 4Daa unit 4
Daa unit 4
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programming
 
Perspective in Informatics 3 - Assignment 1 - Answer Sheet
Perspective in Informatics 3 - Assignment 1 - Answer SheetPerspective in Informatics 3 - Assignment 1 - Answer Sheet
Perspective in Informatics 3 - Assignment 1 - Answer Sheet
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
Skiena algorithm 2007 lecture16 introduction to dynamic programming
Skiena algorithm 2007 lecture16 introduction to dynamic programmingSkiena algorithm 2007 lecture16 introduction to dynamic programming
Skiena algorithm 2007 lecture16 introduction to dynamic programming
 
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
 

Viewers also liked

Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...ActiveEon
 
Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16
Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16
Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16MLconf
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15MLconf
 
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15MLconf
 
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15MLconf
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 1...
Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 1...Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 1...
Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 1...MLconf
 
Lessons from 2MM machine learning models
Lessons from 2MM machine learning modelsLessons from 2MM machine learning models
Lessons from 2MM machine learning modelsExtract Data Conference
 

Viewers also liked (9)

Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
 
Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16
Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16
Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
 
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
 
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 1...
Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 1...Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 1...
Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 1...
 
Lessons from 2MM machine learning models
Lessons from 2MM machine learning modelsLessons from 2MM machine learning models
Lessons from 2MM machine learning models
 

Similar to Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15

Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...MLconf
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)Dongheon Lee
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with TensorflowShubham Sharma
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNNLin JiaMing
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learningbutest
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Integral Calculus Anti Derivatives reviewer
Integral Calculus Anti Derivatives reviewerIntegral Calculus Anti Derivatives reviewer
Integral Calculus Anti Derivatives reviewerJoshuaAgcopra
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function InterpolationJesse Bettencourt
 
Making BIG DATA smaller
Making BIG DATA smallerMaking BIG DATA smaller
Making BIG DATA smallerTony Tran
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
MLconf NYC Animashree Anandkumar
MLconf NYC Animashree AnandkumarMLconf NYC Animashree Anandkumar
MLconf NYC Animashree AnandkumarMLconf
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Algorithms 101 for Data Scientists
Algorithms 101 for Data ScientistsAlgorithms 101 for Data Scientists
Algorithms 101 for Data ScientistsChristopher Conlan
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentRaphael Reitzig
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimizationhelalmohammad2
 

Similar to Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15 (20)

Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with Tensorflow
 
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNN
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learning
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Integral Calculus Anti Derivatives reviewer
Integral Calculus Anti Derivatives reviewerIntegral Calculus Anti Derivatives reviewer
Integral Calculus Anti Derivatives reviewer
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function Interpolation
 
Making BIG DATA smaller
Making BIG DATA smallerMaking BIG DATA smaller
Making BIG DATA smaller
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
MLconf NYC Animashree Anandkumar
MLconf NYC Animashree AnandkumarMLconf NYC Animashree Anandkumar
MLconf NYC Animashree Anandkumar
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Algorithms 101 for Data Scientists
Algorithms 101 for Data ScientistsAlgorithms 101 for Data Scientists
Algorithms 101 for Data Scientists
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient Apportionment
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimization
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SF - 11/13/15

  • 1. Beating the Perils of Non-Convexity: Machine Learning using Tensor Methods Anima Anandkumar U.C. Irvine
  • 2. Learning with Big Data Learning is finding needle in a haystack
  • 3. Learning with Big Data Learning is finding needle in a haystack High dimensional regime: as data grows, more variables! Useful information: low-dimensional structures. Learning with big data: ill-posed problem.
  • 4. Learning with Big Data Learning is finding needle in a haystack High dimensional regime: as data grows, more variables! Useful information: low-dimensional structures. Learning with big data: ill-posed problem. Learning with big data: statistically and computationally challenging!
  • 5. Optimization for Learning Most learning problems can be cast as optimization. Unsupervised Learning Clustering k-means, hierarchical . . . Maximum Likelihood Estimator Probabilistic latent variable models Supervised Learning Optimizing a neural network with respect to a loss function Input Neuron Output
  • 6. Convex vs. Non-convex Optimization Progress is only tip of the iceberg.. Images taken from https://www.facebook.com/nonconvex
  • 7. Convex vs. Non-convex Optimization Progress is only tip of the iceberg.. Real world is mostly non-convex! Images taken from https://www.facebook.com/nonconvex
  • 8. Convex vs. Nonconvex Optimization Unique optimum: global/local. Multiple local optima
  • 9. Convex vs. Nonconvex Optimization Unique optimum: global/local. Multiple local optima In high dimensions possibly exponential local optima
  • 10. Convex vs. Nonconvex Optimization Unique optimum: global/local. Multiple local optima In high dimensions possibly exponential local optima How to deal with non-convexity?
  • 11. Outline 1 Introduction 2 Spectral Methods: Matrices and Tensors 3 Applications of Tensor Methods 4 Conclusion
  • 12. Matrices and Tensors as Data Structures Modern data: Multi-modal, multi-relational data. Matrices: pairwise relations. Tensors: higher order relations. Multi-modal data figure from Lise Getoor slides.
  • 13. Tensor Representation of Higher Order Moments Matrix: Pairwise Moments E[x ⊗ x] ∈ Rd×d is a second order tensor. E[x ⊗ x]i1,i2 = E[xi1 xi2 ]. For matrices: E[x ⊗ x] = E[xx⊤]. Tensor: Higher order Moments E[x ⊗ x ⊗ x] ∈ Rd×d×d is a third order tensor. E[x ⊗ x ⊗ x]i1,i2,i3 = E[xi1 xi2 xi3 ].
  • 14. Spectral Decomposition of Tensors M2 = i λiui ⊗ vi = + .... Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2
  • 15. Spectral Decomposition of Tensors M2 = i λiui ⊗ vi = + .... Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2 M3 = i λiui ⊗ vi ⊗ wi = + .... Tensor M3 λ1u1 ⊗ v1 ⊗ w1 λ2u2 ⊗ v2 ⊗ w2 We have developed efficient methods to solve tensor decomposition.
  • 16. Tensor Decomposition Methods SVD on Pairwise Moments to obtain W Fast SVD via Random Projection + ++ + Dimensionality Reduction of third order moments using W + ++ + Power method on reduced tensor T: v → T(I, v, v) T(I, v, v) . Guaranteed to converge to correct solution!
  • 17. Summary on Tensor Methods = + .... Tensor decomposition: iterative algorithm with tensor contractions. Tensor contraction: T(W1, W2, W3), where Wi are matrices/vectors. Strengths of tensor method Guaranteed convergence to globally optimal solution! Fast and accurate, orders of magnitude faster than previous methods. Embarrassingly parallel and suited for distributed systems, e.g.Spark. Exploit optimized BLAS operations on CPUs and GPUs.
  • 18. Preliminary Results on Spark Alternating Least Squares for Tensor Decomposition. min w,A,B,C T − k i=1 λiA(:, i) ⊗ B(:, i) ⊗ C(:, i) 2 F Update Rows Independently Tensor slices B C worker 1 worker 2 worker i worker k Results on NYtimes corpus 3 ∗ 105 documents, 108 words Spark Map-Reduce 26mins 4 hrs https://github.com/FurongHuang/SpectralLDA-TensorSpark
  • 19. Outline 1 Introduction 2 Spectral Methods: Matrices and Tensors 3 Applications of Tensor Methods 4 Conclusion
  • 20. Tensor Methods for Learning LVMs GMM HMM h1 h2 h3 x1 x2 x3 ICA h1 h2 hk x1 x2 xd Multiview and Topic Models
  • 21. Overview of Our Approach Tensor-based Model Estimation = + + Unlabeled Data
  • 22. Overview of Our Approach Tensor-based Model Estimation = + + Unlabeled Data → Probabilistic admixture models
  • 23. Overview of Our Approach Tensor-based Model Estimation = + + Unlabeled Data → Probabilistic admixture models → Tensor Decomposition
  • 24. Overview of Our Approach Tensor-based Model Estimation = + + Unlabeled Data → Probabilistic admixture models → Tensor Decomposition → Inference
  • 25. Overview of Our Approach Tensor-based Model Estimation = + + Unlabeled Data → Probabilistic admixture models → Tensor Decomposition → Inference tensor decomposition → correct model
  • 26. Overview of Our Approach Tensor-based Model Estimation = + + Unlabeled Data → Probabilistic admixture models → Tensor Decomposition → Inference tensor decomposition → correct model Highly parallelized, CPU/GPU/Cloud implementation. Guaranteed global convergence for Tensor Decomposition.
  • 27. Community Detection on Yelp and DBLP datasets Facebook n ∼ 20, 000 Yelp n ∼ 40, 000 DBLP n ∼ 1 million Tensor vs. Variational FB YP DBLP−sub DBLP 10 0 10 1 10 2 10 3 10 4 10 5 Datasets Runningtime(seconds) Variational Tensor
  • 28. Tensor Methods for Training Neural Networks Tremendous practical impact with deep learning. Algorithm: backpropagation. Highly non-convex optimization
  • 29. Toy Example: Failure of Backpropagation x1 x2 y=1y=−1 Labeled input samples Goal: binary classification σ(·) σ(·) y x1 x2x w1 w2 Our method: guaranteed risk bounds for training neural networks
  • 30. Toy Example: Failure of Backpropagation x1 x2 y=1y=−1 Labeled input samples Goal: binary classification σ(·) σ(·) y x1 x2x w1 w2 Our method: guaranteed risk bounds for training neural networks
  • 31. Toy Example: Failure of Backpropagation x1 x2 y=1y=−1 Labeled input samples Goal: binary classification σ(·) σ(·) y x1 x2x w1 w2 Our method: guaranteed risk bounds for training neural networks
  • 32. Backpropagation vs. Our Method Weights w2 randomly drawn and fixed Backprop (quadratic) loss surface −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 200 250 300 350 400 450 500 550 600 650 w1(1) w1(2)
  • 33. Backpropagation vs. Our Method Weights w2 randomly drawn and fixed Backprop (quadratic) loss surface −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 200 250 300 350 400 450 500 550 600 650 w1(1) w1(2) Loss surface for our method −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 0 20 40 60 80 100 120 140 160 180 200 w1(1) w1(2)
  • 34. Feature Transformation for Training Neural Networks Feature learning: Learn φ(·) from input data. How to use φ(·) to train neural networks? x φ(x) y
  • 35. Feature Transformation for Training Neural Networks Feature learning: Learn φ(·) from input data. How to use φ(·) to train neural networks? x φ(x) y Invert E[φ(x) ⊗ y] to Learn Neural Network Parameters In general, E[φ(x) ⊗ y] is a tensor. What class of φ(·) useful for training neural networks? What algorithms can invert moment tensors to obtain parameters?
  • 36. Score Function Transformations Score function for x ∈ Rd with pdf p(·): S1(x) := −∇x log p(x) Input: x ∈ Rd S1(x) ∈ Rd
  • 37. Score Function Transformations Score function for x ∈ Rd with pdf p(·): S1(x) := −∇x log p(x) Input: x ∈ Rd S1(x) ∈ Rd
  • 38. Score Function Transformations Score function for x ∈ Rd with pdf p(·): S1(x) := −∇x log p(x) Input: x ∈ Rd S1(x) ∈ Rd
  • 39. Score Function Transformations Score function for x ∈ Rd with pdf p(·): S1(x) := −∇x log p(x) mth -order score function: Input: x ∈ Rd S1(x) ∈ Rd
  • 40. Score Function Transformations Score function for x ∈ Rd with pdf p(·): S1(x) := −∇x log p(x) mth -order score function: Sm(x) := (−1)m ∇(m)p(x) p(x) Input: x ∈ Rd S1(x) ∈ Rd
  • 41. Score Function Transformations Score function for x ∈ Rd with pdf p(·): S1(x) := −∇x log p(x) mth -order score function: Sm(x) := (−1)m ∇(m)p(x) p(x) Input: x ∈ Rd S2(x) ∈ Rd×d
  • 42. Score Function Transformations Score function for x ∈ Rd with pdf p(·): S1(x) := −∇x log p(x) mth -order score function: Sm(x) := (−1)m ∇(m)p(x) p(x) Input: x ∈ Rd S3(x) ∈ Rd×d×d
  • 43. NN-LiFT: Neural Network LearnIng using Feature Tensors Input: x ∈ Rd Score function S3(x) ∈ Rd×d×d
  • 44. NN-LiFT: Neural Network LearnIng using Feature Tensors Input: x ∈ Rd Score function S3(x) ∈ Rd×d×d 1 n n i=1 yi ⊗ S3(xi) = 1 n n i=1 yi⊗ Estimating M3 using labeled data {(xi, yi)} S3(xi) Cross- moment
  • 45. NN-LiFT: Neural Network LearnIng using Feature Tensors Input: x ∈ Rd Score function S3(x) ∈ Rd×d×d 1 n n i=1 yi ⊗ S3(xi) = 1 n n i=1 yi⊗ Estimating M3 using labeled data {(xi, yi)} S3(xi) Cross- moment Rank-1 components are the estimates of first layer weights + + · · · CP tensor decomposition
  • 46. Reinforcement Learning (RL) of POMDPs Partially observable Markov decision processes. Proposed Method Consider memoryless policies. Episodic learning: indirect exploration. Tensor methods: careful conditioning required for learning. First RL method for POMDPs with logarithmic regret bounds. xi xi+1 xi+2 yi yi+1 ri ri+1 ai ai+1 0 1000 2000 3000 4000 5000 6000 7000 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 Number of Trials AverageReward SM−UCRL−POMDP UCRL−MDP Q−learning Random Policy Logarithmic Regret Bounds for POMDPs using Spectral Methods by K. Azzizade, A. Lazaric, A. , under preparation.
  • 47. Convolutional Tensor Decomposition == ∗ xx f∗ i w∗ i F∗ w∗ (a)Convolutional dictionary model (b)Reformulated model .... ....= ++ Cumulant λ1(F∗ 1 )⊗3 +λ2(F∗ 2 )⊗3 . . . Efficient methods for tensor decomposition with circulant constraints. Convolutional Dictionary Learning through Tensor Factorization by F. Huang, A. , June 2015.
  • 48. Hierarchical Tensor Decomposition = + .... = + .... = + .... = + .... Relevant for learning latent tree graphical models. Propose first algorithm for integrated learning of hierarchy and components of tensor decomposition. Highly parallelized operations but with global consistency guarantees. Scalable Latent Tree Model and its Application to Health Analytics By F. Huang, U. N. Niranjan, J. Perros, R. Chen, J. Sun, A. , Preprint, June 2014.
  • 49. Outline 1 Introduction 2 Spectral Methods: Matrices and Tensors 3 Applications of Tensor Methods 4 Conclusion
  • 50. Summary and Outlook Summary Tensor methods: a powerful paradigm for guaranteed large-scale machine learning. First methods to provide provable bounds for training neural networks, many latent variable models (e.g HMM, LDA), POMDPs!
  • 51. Summary and Outlook Summary Tensor methods: a powerful paradigm for guaranteed large-scale machine learning. First methods to provide provable bounds for training neural networks, many latent variable models (e.g HMM, LDA), POMDPs! Outlook Training multi-layer neural networks, models with invariances, reinforcement learning using neural networks . . . Unified framework for tractable non-convex methods with guaranteed convergence to global optima?
  • 52. My Research Group and Resources Podcast/lectures/papers/software available at http://newport.eecs.uci.edu/anandkumar/