SlideShare a Scribd company logo
1 of 60
Download to read offline
Machine Learning on Big Data
Lessons Learned from Google Projects

Max Lin
Software Engineer | Google Research

Massively Parallel Computing | Harvard CS 264
Guest Lecture | March 29th, 2011
Outline

• Machine Learning intro
• Scaling machine learning algorithms up
• Design choices of large scale ML systems
Outline

• Machine Learning intro
• Scaling machine learning algorithms up
• Design choices of large scale ML systems
“Machine Learning is a study
of computer algorithms that
   improve automatically
    through experience.”
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Google Projects (Max Lin, Google Research)
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Google Projects (Max Lin, Google Research)
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Google Projects (Max Lin, Google Research)
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Google Projects (Max Lin, Google Research)
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Google Projects (Max Lin, Google Research)
The quick brown fox
                                           English
           jumped over the lazy dog.
           To err is human, but to
           really foul things up you       English
Training        Input X
           need a computer.             Output Y
           No hay mal que por bien
                                           Spanish
           no venga.
                            Model f(x)
           La tercera es la vencida. Spanish

           To be or not to be -- that
                                               ?
Testing                 f(x’)
           is the question
                                        = y’
           La fe mueve montañas.               ?
Linear Classifier
       The quick brown fox jumped over the lazy dog.

    ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
x [ 0, ...      0,     ... 1, ... 1, ...          0,      ... ]

w [ 0.1, ...    132,     ... 150, ... 200, ...     -153,   ... ]
                                   P
                                   
                 f (x) = w · x =         wp ∗ xp
                                   p=1
Training Data
                 Input X                      Ouput Y

                        P


                                  ...

                                  ...

                                  ...
N




     ...   ...    ...       ...         ...     ...

                                  ...
Typical machine learning
data at Google

N: 100 billions / 1 billion
P: 1 billion / 10 million
(mean / median)




                              http://www.flickr.com/photos/mr_t_in_dc/5469563053
Classifier Training


• Training: Given {(x, y)} and f, minimize the
  following objective function
                  N
                  
        arg min         L(yi , f (xi ; w)) + R(w)
             w
                  n=1
Use Newton’s method?
    t+1      t     t −1                    t
w         ← w − H(w )      ∇J(w )




                    http://www.flickr.com/photos/visitfinland/5424369765/
Outline

• Machine Learning intro
• Scaling machine learning algorithms up
• Design choices of large scale ML systems
Scaling Up

• Why big data?
• Parallelize machine learning algorithms
 • Embarrassingly parallel
 • Parallelize sub-routines
 • Distributed learning
Subsampling
                               Big Data




Reduce N   Shard 1   Shard 2     Shard 3
                                           ...
                                                 Shard M



           Machine




           Model
Why not Small Data?




                [Banko and Brill, 2001]
Scaling Up

• Why big data?
• Parallelize machine learning algorithms
 • Embarrassingly parallel
 • Parallelize sub-routines
 • Distributed learning
Parallelize Estimates
• Naive Bayes Classifier
                 N P
                 
                               i
     arg min −             P (xp |yi ; w)P (yi ; w)
         w
                 i=1 p=1


• Maximum Likelihood Estimates
                          N              i
                           i=1 1EN,the (x )
        wthe|EN =          N
                             i=1 1EN (xi )
Word Counting
                                            (‘the|EN’, 1)
         X: “The quick brown fox ...”
 Map                                        (‘quick|EN’, 1)
         Y: EN
                                            (‘brown|EN’, 1)

Reduce     [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ]
                C(‘the’|EN) = SUM of values = 3
                                
                           C( the |EN )
           w the |EN   =
                             C(EN )
Word Counting
                                      Big Data

             Mapper 1   Mapper 2    Mapper 3             Mapper M

 Map          Shard 1    Shard 2      Shard 3      ...    Shard M



         (‘the’ | EN, 1) (‘fox’ | EN, 1) ... (‘montañas’ | ES, 1)

                                     Reducer
Reduce                              Tally counts
                                   and update w


                                      Model
Parallelize Optimization
• Maximum Entropy Classifiers
                         P
             N
                                 i yi
                   exp( p=1 wp ∗ xp )
      arg min             P
           w
              i=1 1 + exp( p=1 wp ∗ xi )
                                      p


• Good: J(w) is concave
• Bad: no closed-form solution like NB
• Ugly: Large N
Gradient Descent




        http://www.cs.cmu.edu/~epxing/Class/10701/Lecture/lecture7.pdf
Gradient Descent
• w is initialized as zero
• for t in 1 to T
 • Calculate gradients ∇J(w)
 • w ← w − η∇J(w)
      t+1         t


          N
          
∇J(w) =         P (w, xi , yi )
          i=1
Distribute Gradient
• w is initialized as zero
• for t in 1 to T
 • Calculate gradients in parallel
    wt+1 ← wt − η∇J(w)



• Training CPU: O(TPN) to O(TPN / M)
Distribute Gradient
                                      Big Data

          Machine 1     Machine 2   Machine 3          Machine M

 Map       Shard 1       Shard 2     Shard 3     ...    Shard M



                     (dummy key, partial gradient sum)


Reduce                               Sum and
                                     Update w


           Repeat M/R
          until converge               Model
Scaling Up

• Why big data?
• Parallelize machine learning algorithms
 • Embarrassingly parallel
 • Parallelize sub-routines
 • Distributed learning
Parallelize Subroutines
• Support Vector Machines
                 1
                                         n
                                         
                                2
           arg min         ||w||2   +C         ζi
               w,b,ζ   2                 i=1

    s.t.   1 − yi (w · φ(xi ) + b) ≤ ζi , ζi ≥ 0
• Solve the dual problem
                    1 T
             arg min α Qα − αT 1
                  α 2

            s.t.   0 ≤ α ≤ C, yT α = 0
The computational
cost for the Primal-
Dual Interior Point
Method is O(n^3) in
time and O(n^2) in
      memory




http://www.flickr.com/photos/sea-turtle/198445204/
Parallel SVM                [Chang et al, 2007]




•   Parallel, row-wise incomplete Cholesky
    Factorization for Q
•   Parallel interior point method
    •   Time O(n^3) becomes O(n^2 / M)
                                   √
    •   Memory O(n^2) becomes O(n N / M)
•   Parallel Support Vector Machines (psvm) http://
    code.google.com/p/psvm/
    •   Implement in MPI
Parallel ICF
• Distribute Q by row into M machines
    Machine 1     Machine 2   Machine 3

      row 1        row 3       row 5      ...
      row 2        row 4       row 6


• For each dimension n  N    √

  • Send local pivots to master
  • Master selects largest local pivots and
    broadcast the global pivot to workers
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Google Projects (Max Lin, Google Research)
Scaling Up

• Why big data?
• Parallelize machine learning algorithms
 • Embarrassingly parallel
 • Parallelize sub-routines
 • Distributed learning
Majority Vote
                                Big Data

      Machine 1   Machine 2   Machine 3          Machine M

Map    Shard 1     Shard 2     Shard 3     ...    Shard M




      Model 1     Model 2      Model 3           Model 4
Majority Vote

• Train individual classifiers independently
• Predict by taking majority votes
• Training CPU: O(TPN) to O(TPN / M)
Parameter Mixture                          [Mann et al, 2009]

                                   Big Data

         Machine 1   Machine 2   Machine 3                   Machine M

 Map      Shard 1     Shard 2     Shard 3     ...             Shard M




             (dummy key, w1) (dummy key, w2) ...

Reduce                            Average w




                                    Model
Much Less network
                                                      usage than
                                                      distributed gradient
                                                      descent
                                                      O(MN) vs. O(MNT)




ttp://www.flickr.com/photos/annamatic3000/127945652/
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Google Projects (Max Lin, Google Research)
Iterative Param Mixture                       [McDonald et al., 2010]

                                       Big Data

             Machine 1   Machine 2   Machine 3                Machine M

  Map         Shard 1     Shard 2     Shard 3     ...           Shard M




                 (dummy key, w1) (dummy key, w2) ...
 Reduce
after each                            Average w

 epoch
                                        Model
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Google Projects (Max Lin, Google Research)
Outline

• Machine Learning intro
• Scaling machine learning algorithms up
• Design choices of large scale ML systems
Scalable



           http://www.flickr.com/photos/mr_t_in_dc/5469563053
Parallel



http://www.flickr.com/photos/aloshbennett/3209564747/
Accuracy
http://www.flickr.com/photos/wanderlinse/4367261825/
http://www.flickr.com/photos/imagelink/4006753760/
Binary
                                                     Classification
http://www.flickr.com/photos/brenderous/4532934181/
Automatic
 Feature
Discovery


   http://www.flickr.com/photos/mararie/2340572508/
Fast
                                              Response

http://www.flickr.com/photos/prunejuice/3687192643/
Memory is new
      hard disk.




http://www.flickr.com/photos/jepoirrier/840415676/
Algorithm +
                                                Infrastructure

http://www.flickr.com/photos/neubie/854242030/
Design for
Multicores
             http://www.flickr.com/photos/geektechnique/2344029370/
Combiner
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Google Projects (Max Lin, Google Research)
Multi-shard Combiner




[Chandra et al., 2010]
Machine
Learning on
 Big Data
Parallelize ML
         Algorithms

• Embarrassingly parallel
• Parallelize sub-routines
• Distributed learning
Parallel   Accuracy


  Fast
Response
Google APIs
•   Prediction API
    •   machine learning service on the cloud
    •   http://code.google.com/apis/predict


•   BigQuery
    •   interactive analysis of massive data on the cloud
    •   http://code.google.com/apis/bigquery

More Related Content

What's hot

Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsYoonho Lee
 
Actors for Behavioural Simulation
Actors for Behavioural SimulationActors for Behavioural Simulation
Actors for Behavioural SimulationClarkTony
 
NYAI - A Path To Unsupervised Learning Through Adversarial Networks by Soumit...
NYAI - A Path To Unsupervised Learning Through Adversarial Networks by Soumit...NYAI - A Path To Unsupervised Learning Through Adversarial Networks by Soumit...
NYAI - A Path To Unsupervised Learning Through Adversarial Networks by Soumit...Rizwan Habib
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningCharles Deledalle
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsNAVER Engineering
 
Deep Generative Models
Deep Generative ModelsDeep Generative Models
Deep Generative ModelsMijung Kim
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydSri Ambati
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationFeynman Liang
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsPK Lehre
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryRikiya Takahashi
 
KDD CUP 2015 - 9th solution
KDD CUP 2015 - 9th solutionKDD CUP 2015 - 9th solution
KDD CUP 2015 - 9th solution志明 陳
 
Approximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsApproximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsMichael Stumpf
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsFabian Pedregosa
 

What's hot (20)

QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
 
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 
Actors for Behavioural Simulation
Actors for Behavioural SimulationActors for Behavioural Simulation
Actors for Behavioural Simulation
 
NYAI - A Path To Unsupervised Learning Through Adversarial Networks by Soumit...
NYAI - A Path To Unsupervised Learning Through Adversarial Networks by Soumit...NYAI - A Path To Unsupervised Learning Through Adversarial Networks by Soumit...
NYAI - A Path To Unsupervised Learning Through Adversarial Networks by Soumit...
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learning
 
Bioalgo 2012-04-hmm
Bioalgo 2012-04-hmmBioalgo 2012-04-hmm
Bioalgo 2012-04-hmm
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
 
Deep Generative Models
Deep Generative ModelsDeep Generative Models
Deep Generative Models
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 
1533 game mathematics
1533 game mathematics1533 game mathematics
1533 game mathematics
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game Theory
 
KDD CUP 2015 - 9th solution
KDD CUP 2015 - 9th solutionKDD CUP 2015 - 9th solution
KDD CUP 2015 - 9th solution
 
Approximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsApproximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUs
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 

Viewers also liked

Machine Learning and Big Data at Foursquare
Machine Learning and Big Data at FoursquareMachine Learning and Big Data at Foursquare
Machine Learning and Big Data at Foursquaremetablake
 
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...npinto
 
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...npinto
 
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...npinto
 
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...npinto
 
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...npinto
 
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)npinto
 
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...npinto
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
Top 3 Challenges to Profitable Mortgage Lending
Top 3 Challenges to Profitable Mortgage LendingTop 3 Challenges to Profitable Mortgage Lending
Top 3 Challenges to Profitable Mortgage LendingEquifax
 
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...npinto
 
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...npinto
 
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...npinto
 
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...npinto
 
Graph analytic and machine learning
Graph analytic and machine learningGraph analytic and machine learning
Graph analytic and machine learningStanley Wang
 
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...npinto
 
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)npinto
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)npinto
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataDatameer
 

Viewers also liked (20)

Machine Learning and Big Data at Foursquare
Machine Learning and Big Data at FoursquareMachine Learning and Big Data at Foursquare
Machine Learning and Big Data at Foursquare
 
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
 
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
 
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
[Harvard CS264] 10b - cl.oquence: High-Level Language Abstractions for Low-Le...
 
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
[Harvard CS264] 08a - Cloud Computing, Amazon EC2, MIT StarCluster (Justin Ri...
 
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
 
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)
[Harvard CS264] 15a - Jacket: Visual Computing (James Malcolm, Accelereyes)
 
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
[Harvard CS264] 14 - Dynamic Compilation for Massively Parallel Processors (G...
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
Learning Analytics
Learning AnalyticsLearning Analytics
Learning Analytics
 
Top 3 Challenges to Profitable Mortgage Lending
Top 3 Challenges to Profitable Mortgage LendingTop 3 Challenges to Profitable Mortgage Lending
Top 3 Challenges to Profitable Mortgage Lending
 
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
[Harvard CS264] 12 - Irregular Parallelism on the GPU: Algorithms and Data St...
 
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...
 
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
 
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
 
Graph analytic and machine learning
Graph analytic and machine learningGraph analytic and machine learning
Graph analytic and machine learning
 
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
 
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big Data
 

Similar to [Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Google Projects (Max Lin, Google Research)

Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer visionzukun
 
Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in PythonValerio Maggio
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber SecurityAltoros
 
dynamic programming Rod cutting class
dynamic programming Rod cutting classdynamic programming Rod cutting class
dynamic programming Rod cutting classgiridaroori
 
Breaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language ModelBreaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language ModelSsu-Rui Lee
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesNatan Katz
 

Similar to [Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Google Projects (Max Lin, Google Research) (20)

Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
 
Mit6 094 iap10_lec03
Mit6 094 iap10_lec03Mit6 094 iap10_lec03
Mit6 094 iap10_lec03
 
Regression
RegressionRegression
Regression
 
Normalizing flow
Normalizing flowNormalizing flow
Normalizing flow
 
Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in Python
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Lec3
Lec3Lec3
Lec3
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Lecture5.pptx
Lecture5.pptxLecture5.pptx
Lecture5.pptx
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber Security
 
dynamic programming Rod cutting class
dynamic programming Rod cutting classdynamic programming Rod cutting class
dynamic programming Rod cutting class
 
Breaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language ModelBreaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language Model
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 

More from npinto

"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)npinto
 
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...npinto
 
[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programming[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programmingnpinto
 
[Harvard CS264] 04 - Intermediate-level CUDA Programming
[Harvard CS264] 04 - Intermediate-level CUDA Programming[Harvard CS264] 04 - Intermediate-level CUDA Programming
[Harvard CS264] 04 - Intermediate-level CUDA Programmingnpinto
 
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basicsnpinto
 
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patternsnpinto
 
[Harvard CS264] 01 - Introduction
[Harvard CS264] 01 - Introduction[Harvard CS264] 01 - Introduction
[Harvard CS264] 01 - Introductionnpinto
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...npinto
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: CUDA Tricks and High-Performance Comput...
IAP09 CUDA@MIT 6.963 - Guest Lecture: CUDA Tricks and High-Performance Comput...IAP09 CUDA@MIT 6.963 - Guest Lecture: CUDA Tricks and High-Performance Comput...
IAP09 CUDA@MIT 6.963 - Guest Lecture: CUDA Tricks and High-Performance Comput...npinto
 
IAP09 CUDA@MIT 6.963 - Lecture 07: CUDA Advanced #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 07: CUDA Advanced #2 (Nicolas Pinto, MIT)IAP09 CUDA@MIT 6.963 - Lecture 07: CUDA Advanced #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 07: CUDA Advanced #2 (Nicolas Pinto, MIT)npinto
 
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)npinto
 
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)npinto
 
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)npinto
 
IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)npinto
 
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...npinto
 
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...npinto
 

More from npinto (16)

"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)"AI" for Blockchain Security (Case Study: Cosmos)
"AI" for Blockchain Security (Case Study: Cosmos)
 
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
 
[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programming[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programming
 
[Harvard CS264] 04 - Intermediate-level CUDA Programming
[Harvard CS264] 04 - Intermediate-level CUDA Programming[Harvard CS264] 04 - Intermediate-level CUDA Programming
[Harvard CS264] 04 - Intermediate-level CUDA Programming
 
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
 
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
 
[Harvard CS264] 01 - Introduction
[Harvard CS264] 01 - Introduction[Harvard CS264] 01 - Introduction
[Harvard CS264] 01 - Introduction
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: CUDA Tricks and High-Performance Comput...
IAP09 CUDA@MIT 6.963 - Guest Lecture: CUDA Tricks and High-Performance Comput...IAP09 CUDA@MIT 6.963 - Guest Lecture: CUDA Tricks and High-Performance Comput...
IAP09 CUDA@MIT 6.963 - Guest Lecture: CUDA Tricks and High-Performance Comput...
 
IAP09 CUDA@MIT 6.963 - Lecture 07: CUDA Advanced #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 07: CUDA Advanced #2 (Nicolas Pinto, MIT)IAP09 CUDA@MIT 6.963 - Lecture 07: CUDA Advanced #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 07: CUDA Advanced #2 (Nicolas Pinto, MIT)
 
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
 
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
 
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)
 
IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)
 
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
 
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
 

Recently uploaded

How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationMJDuyan
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxKatherine Villaluna
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
Patterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxPatterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxMYDA ANGELICA SUAN
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...CaraSkikne1
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice documentXsasf Sfdfasd
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17Celine George
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesMohammad Hassany
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxDr. Santhosh Kumar. N
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphNetziValdelomar1
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxEduSkills OECD
 

Recently uploaded (20)

How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive Education
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
Patterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxPatterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptx
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice document
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming Classes
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptx
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a Paragraph
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
 

[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Google Projects (Max Lin, Google Research)

  • 1. Machine Learning on Big Data Lessons Learned from Google Projects Max Lin Software Engineer | Google Research Massively Parallel Computing | Harvard CS 264 Guest Lecture | March 29th, 2011
  • 2. Outline • Machine Learning intro • Scaling machine learning algorithms up • Design choices of large scale ML systems
  • 3. Outline • Machine Learning intro • Scaling machine learning algorithms up • Design choices of large scale ML systems
  • 4. “Machine Learning is a study of computer algorithms that improve automatically through experience.”
  • 10. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training Input X need a computer. Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish To be or not to be -- that ? Testing f(x’) is the question = y’ La fe mueve montañas. ?
  • 11. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ] w [ 0.1, ... 132, ... 150, ... 200, ... -153, ... ] P f (x) = w · x = wp ∗ xp p=1
  • 12. Training Data Input X Ouput Y P ... ... ... N ... ... ... ... ... ... ...
  • 13. Typical machine learning data at Google N: 100 billions / 1 billion P: 1 billion / 10 million (mean / median) http://www.flickr.com/photos/mr_t_in_dc/5469563053
  • 14. Classifier Training • Training: Given {(x, y)} and f, minimize the following objective function N arg min L(yi , f (xi ; w)) + R(w) w n=1
  • 15. Use Newton’s method? t+1 t t −1 t w ← w − H(w ) ∇J(w ) http://www.flickr.com/photos/visitfinland/5424369765/
  • 16. Outline • Machine Learning intro • Scaling machine learning algorithms up • Design choices of large scale ML systems
  • 17. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  • 18. Subsampling Big Data Reduce N Shard 1 Shard 2 Shard 3 ... Shard M Machine Model
  • 19. Why not Small Data? [Banko and Brill, 2001]
  • 20. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  • 21. Parallelize Estimates • Naive Bayes Classifier N P i arg min − P (xp |yi ; w)P (yi ; w) w i=1 p=1 • Maximum Likelihood Estimates N i i=1 1EN,the (x ) wthe|EN = N i=1 1EN (xi )
  • 22. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1) Reduce [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ] C(‘the’|EN) = SUM of values = 3 C( the |EN ) w the |EN = C(EN )
  • 23. Word Counting Big Data Mapper 1 Mapper 2 Mapper 3 Mapper M Map Shard 1 Shard 2 Shard 3 ... Shard M (‘the’ | EN, 1) (‘fox’ | EN, 1) ... (‘montañas’ | ES, 1) Reducer Reduce Tally counts and update w Model
  • 24. Parallelize Optimization • Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p • Good: J(w) is concave • Bad: no closed-form solution like NB • Ugly: Large N
  • 25. Gradient Descent http://www.cs.cmu.edu/~epxing/Class/10701/Lecture/lecture7.pdf
  • 26. Gradient Descent • w is initialized as zero • for t in 1 to T • Calculate gradients ∇J(w) • w ← w − η∇J(w) t+1 t N ∇J(w) = P (w, xi , yi ) i=1
  • 27. Distribute Gradient • w is initialized as zero • for t in 1 to T • Calculate gradients in parallel wt+1 ← wt − η∇J(w) • Training CPU: O(TPN) to O(TPN / M)
  • 28. Distribute Gradient Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, partial gradient sum) Reduce Sum and Update w Repeat M/R until converge Model
  • 29. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  • 30. Parallelize Subroutines • Support Vector Machines 1 n 2 arg min ||w||2 +C ζi w,b,ζ 2 i=1 s.t. 1 − yi (w · φ(xi ) + b) ≤ ζi , ζi ≥ 0 • Solve the dual problem 1 T arg min α Qα − αT 1 α 2 s.t. 0 ≤ α ≤ C, yT α = 0
  • 31. The computational cost for the Primal- Dual Interior Point Method is O(n^3) in time and O(n^2) in memory http://www.flickr.com/photos/sea-turtle/198445204/
  • 32. Parallel SVM [Chang et al, 2007] • Parallel, row-wise incomplete Cholesky Factorization for Q • Parallel interior point method • Time O(n^3) becomes O(n^2 / M) √ • Memory O(n^2) becomes O(n N / M) • Parallel Support Vector Machines (psvm) http:// code.google.com/p/psvm/ • Implement in MPI
  • 33. Parallel ICF • Distribute Q by row into M machines Machine 1 Machine 2 Machine 3 row 1 row 3 row 5 ... row 2 row 4 row 6 • For each dimension n N √ • Send local pivots to master • Master selects largest local pivots and broadcast the global pivot to workers
  • 35. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  • 36. Majority Vote Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M Model 1 Model 2 Model 3 Model 4
  • 37. Majority Vote • Train individual classifiers independently • Predict by taking majority votes • Training CPU: O(TPN) to O(TPN / M)
  • 38. Parameter Mixture [Mann et al, 2009] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ... Reduce Average w Model
  • 39. Much Less network usage than distributed gradient descent O(MN) vs. O(MNT) ttp://www.flickr.com/photos/annamatic3000/127945652/
  • 41. Iterative Param Mixture [McDonald et al., 2010] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ... Reduce after each Average w epoch Model
  • 43. Outline • Machine Learning intro • Scaling machine learning algorithms up • Design choices of large scale ML systems
  • 44. Scalable http://www.flickr.com/photos/mr_t_in_dc/5469563053
  • 48. Binary Classification http://www.flickr.com/photos/brenderous/4532934181/
  • 49. Automatic Feature Discovery http://www.flickr.com/photos/mararie/2340572508/
  • 50. Fast Response http://www.flickr.com/photos/prunejuice/3687192643/
  • 51. Memory is new hard disk. http://www.flickr.com/photos/jepoirrier/840415676/
  • 52. Algorithm + Infrastructure http://www.flickr.com/photos/neubie/854242030/
  • 53. Design for Multicores http://www.flickr.com/photos/geektechnique/2344029370/
  • 58. Parallelize ML Algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  • 59. Parallel Accuracy Fast Response
  • 60. Google APIs • Prediction API • machine learning service on the cloud • http://code.google.com/apis/predict • BigQuery • interactive analysis of massive data on the cloud • http://code.google.com/apis/bigquery