SlideShare a Scribd company logo
1 of 40
Download to read offline
Large-Scale Lasso and
Elastic-Net Regularized
Generalized Linear Models
DB Tsai
Steven Hillion
Outline
–  Introduction
–  Linear / Nonlinear Classification
–  Feature Engineering - Polynomial Expansion
–  Big-data Elastic-Net Regularized Linear Models
Introduction
–  Classification is an important and common problem
•  Churn analysis, fraud detection, etc….even product
recommendations
–  Many observations and variables, non-linear
relationships
–  Non-linear and non-parametric models are popular
solutions, but they are slow and difficult to interpret
–  Our solution
•  Automated feature generation with polynomial mappings
•  Regularized regressions with various performance
optimizations
–  Linear : In the data’s original input space, labels can be
classified by a linear decision boundary.
–  Nonlinear : The classifiers have nonlinear, and possibly
discontinuous decision boundaries.
Linear / Nonlinear Classification
Linear Classifier Examples
–  Logistic Regression
–  Support Vector Machine
–  Naive Bayes Classifier
–  Linear Discriminant Analysis
Nonlinear Classifier Examples
–  Kernel Support Vector Machine
–  Multi-Layer Neural Networks
–  Decision Tree / Random Forest
–  Gradient Boosted Decision Trees
–  K-nearest Neighbors Algorithm
Feature Engineering
Decision Boundary in
Transformed Space
Decision Boundary in
Original Space
Feature Engineering
Ref: https://youtu.be/3liCbRZPrZA
Low-Degree Polynomial Mappings
–  2nd Order Example:
–  The dimension of d-degree polynomial mappings
–  C.J. Lin, et al., Training and Testing Low-degree
Polynomial Data Mappings via Linear SVM, JMLR, 2010
2-Degree Polynomial Mapping
–  2-Degree Polynomial Mapping:
# of features = O(n2) for one training sample
–  2-Degree Polynomial Kernel Method:
# of features = O(nl) for one training sample
–  n is the dimension of original training sample,
l is the number of training samples.
–  In typical setting, l >> n.
–  For sparse data, n is the average # non-zeros,
O(n2) << O(n2) ; O(n2) << O(nl)
Kernel Methods vs Polynomial
Mapping
Cover’s Theorem
A complex pattern-classification problem, cast in a
high-dimensional space nonlinearly, is more likely
to be linearly separable than in a low-dimensional
space, provided that the space is not densely
populated.
— Cover, T.M. , Geometrical and Statistical
properties of systems of linear inequalities with
applications in pattern recognition., 1965
Logistic Regression & Overfitting
–  Given a decision boundary, or a hyperplane ax1 + bx2 + c = 0
–  Distance from a sample to hyperplane
Same classification result but more
sensitive probability
z
z
z
Finding Hyperplane
–  Maximum Likelihood Estimation: From a training dataset
–  We want to find that maximizes the likelihood of data
–  With linear separable dataset, likelihood can always be
increased with the same hyperplane by multiplying a constant
into weights which resulting steeper curve in logistic function.
–  This can be addressed by regularization to reduce model
complexity which increases the accuracy of prediction on
unseen data.
Training Logistic Regression
–  Converting the product to summation by taking the natural
logarithm of likelihood will be more convenient to work with.
–  The negative log-likelihood will be our loss function
Regularization
–  The loss function becomes
–  The loss function of regularization doesn’t depend on data.
–  Common regularizations are
- L2 Regularization:
- L1 Regularization:
- Elastic-Net Regularization:
Geometric Interpretation
–  The ellipses indicate the posterior distribution for no
regularization.
–  The solid areas show the
constraints due to
regularization.
–  The corners of the L1
regularization create more
opportunities for the solution
to have zeros for some of the weights.
Intuitive Interpretation
–  L2 penalizes the square of weights resulting very strong
“force” pushing down big weights into tiny ones. For small
weights, the “force” will be very small.
–  L1 penalizes their absolute value resulting smaller “force”
compared with L2 when weights are large. For smaller
weights, the “force” will be stronger than L2 which drives
small weights to zero.
–  Combining L1 and L2 penalties are called Elastic-Net
method which tends to give a result in between.
Optimization
–  We want to minimize loss function
–  First Order Minimizer - require loss, gradient vector of loss
•  Gradient Descent is learning rate
•  L-BFGS (Limited-memory BFGS)
•  OWLQN (Orthant-Wise Limited-memory Quasi-Newton) for L1
•  Coordinate Descent
–  Second Order Minimizer - require loss, gradient, hessian matrix of loss
•  Newton-Raphson, quadratic convergence which is fast!
Ref: Journal of Machine Learning Research 11 (2010) 3183-3234,
Chih-Jen Lin et al.
Issue of Second Order Minimizer
–  Scale horizontally (the numbers of training data) by
leveraging on Spark to parallelize this iterative optimization
process.
–  Don't scale vertically (the numbers of training features).
–  Dimension of Hessian Matrix: dim(H) = n2
–  Recent applications from document classification and
computational linguistics are of this type.
Apache Spark Logistic Regression
–  The total loss and total gradient have two part; model part
depends on data while regularization part doesn’t depend on
data.
–  The loss and gradient of each sample is independent.
Apache Spark Logistic Regression
–  Compute the loss and gradient in parallel in executors/
workers; reduce them to get the lossSum and
gradientSum in driver/controller.
–  Since regularization doesn’t depend on data, the loss
and gradient sum are added after distributed
computation in driver.
–  Optimization is done in single machine in driver; L1
regularization is handled by OWLQN optimizer.
Apache Spark Logistic Regression
Broadcast Weights
to Executors
Driver/Controller
Executors/Workers
Loop until converge
Initialize
Weights
Compute loss and
gradient for each
sample, and sum
them up locally
Final Model
Weights
Compute loss and
gradient for each
sample, and sum
them up locally
Compute loss and
gradient for each
sample, and sum
them up locally
Reduce from
executors to
get lossSum
and
gradientSum
Handle
regularization
and use LBFGS/
OWLQN to find
next step
Driver/Controller
Apache Spark Linear Models
–  [SPARK-5253] Linear Regression with Elastic Net (L1/L2)
[SPARK-7262] Binary Logistic Regression with Elastic Net
•  Author: DB Tsai, merged in Spark 1.4
•  Internally handle feature scaling to improve convergence and
avoid penalizing too much on those features with low variances
•  Solutions exactly match R’s glmnet but with scalability
•  For LiR, the intercept is computed using close form like R
•  For LoR, clever initial weights are used for faster convergence
–  [SPARK-5894] Feature Polynomial Mapping
•  Author: Xusen Yin, merged in Spark 1.4
Convergence: a9a dataset
Convergence: news20 dataset
Convergence: rcv1 dataset
Polynomial Mapping Experiment
–  New Spark ML Pipeline APIs allows us to construct
the experiment very easily.
–  StringIndexer for converting a string of labels into
label indices used in algorithms.
–  PolynomialExpansion for mapping the features into
high dimensional space.
–  LogisticRegression for training large scale Logistic
Regression with L1/L2 Elastic-Net regularization.
Datasets
–  a9a, ijcnn1, and webspam datasets are used in the experiment.
Comparison
Test
Accuracy
Linear SVM Linear SVM
Degree-2
Polynomial
SVM RBF
Kernel
Logistic
Regression
Logistic
Regression
Degree-2
Polynomial
a9a 84.98 85.06 85.03 85.0 85.26
ijcnn1 92.21 97.84 98.69 92.0 97.74
webspam 93.15 98.44 99.20 92.76 98.57
–  The results of Linear and Kernel SVM experiment are from
C.J. Lin, et al., Training and Testing Low-degree Polynomial Data
Mappings via Linear SVM, JMLR, 2010
Conclusion
–  For some problems, linear methods with feature
engineering are as good as nonlinear kernel methods.
–  However, the training and scoring are much faster for linear
methods.
–  For problems of document classification with sparsity, or
high dimensional classification, linear methods usually
perform well.
–  With Elastic-Net, sparse models get be trained, and the
models are easier to interpret.
Thank you!
Questions?

More Related Content

What's hot

A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...Spark Summit
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkDB Tsai
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine LearningCarol McDonald
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondXiangrui Meng
 
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...Accumulo Summit
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsJen Aman
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissSpark Summit
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Jen Aman
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
Large Scale Machine learning with Spark
Large Scale Machine learning with SparkLarge Scale Machine learning with Spark
Large Scale Machine learning with SparkMd. Mahedi Kaysar
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Spark Summit
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat DetectionJen Aman
 
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...Databricks
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLMLconf
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Spark Summit
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLOptimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLSpark Summit
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniquesmark_landry
 

What's hot (20)

A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and Beyond
 
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Large Scale Machine learning with Spark
Large Scale Machine learning with SparkLarge Scale Machine learning with Spark
Large Scale Machine learning with Spark
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLOptimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone ML
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniques
 

Viewers also liked

2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
 
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Neil Andrassy
 
2014 spark with elastic search
2014   spark with elastic search2014   spark with elastic search
2014 spark with elastic searchHenry Saputra
 
Spark Summit EU talk by Tug Grall
Spark Summit EU talk by Tug GrallSpark Summit EU talk by Tug Grall
Spark Summit EU talk by Tug GrallSpark Summit
 
Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
Spark Summit EU talk by Shaun Klopfenstein and Neelesh ShastrySpark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
Spark Summit EU talk by Shaun Klopfenstein and Neelesh ShastrySpark Summit
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and SparkAudible, Inc.
 
Sigorta sektöründe tahmin modellemesi uygulama alan örnekleri
Sigorta sektöründe tahmin modellemesi uygulama alan örnekleriSigorta sektöründe tahmin modellemesi uygulama alan örnekleri
Sigorta sektöründe tahmin modellemesi uygulama alan örnekleriBalkir Demirkan
 
Eu versus turkey for mtpl application
Eu versus turkey for mtpl applicationEu versus turkey for mtpl application
Eu versus turkey for mtpl applicationBalkir Demirkan
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkMila, Université de Montréal
 
A Non Linear Model to explain persons with Stroke
A Non Linear Model to explain persons with StrokeA Non Linear Model to explain persons with Stroke
A Non Linear Model to explain persons with StrokeHariohm Pandian
 
Gradient Dynamical Systems, Bifurcation Theory, Numerical Methods and Applica...
Gradient Dynamical Systems, Bifurcation Theory, Numerical Methods and Applica...Gradient Dynamical Systems, Bifurcation Theory, Numerical Methods and Applica...
Gradient Dynamical Systems, Bifurcation Theory, Numerical Methods and Applica...Boris Fackovec
 
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...hirokazutanaka
 
Computational Motor Control: Reinforcement Learning (JAIST summer course)
Computational Motor Control: Reinforcement Learning (JAIST summer course) Computational Motor Control: Reinforcement Learning (JAIST summer course)
Computational Motor Control: Reinforcement Learning (JAIST summer course) hirokazutanaka
 
Recurrence Quantification Analysis : Tutorial & application to eye-movement data
Recurrence Quantification Analysis :Tutorial & application to eye-movement dataRecurrence Quantification Analysis :Tutorial & application to eye-movement data
Recurrence Quantification Analysis : Tutorial & application to eye-movement dataDeb Aks
 
Computational Motor Control: Optimal Control for Stochastic Systems (JAIST su...
Computational Motor Control: Optimal Control for Stochastic Systems (JAIST su...Computational Motor Control: Optimal Control for Stochastic Systems (JAIST su...
Computational Motor Control: Optimal Control for Stochastic Systems (JAIST su...hirokazutanaka
 
JAISTサマースクール2016「脳を知るための理論」講義01 Single neuron models
JAISTサマースクール2016「脳を知るための理論」講義01 Single neuron modelsJAISTサマースクール2016「脳を知るための理論」講義01 Single neuron models
JAISTサマースクール2016「脳を知るための理論」講義01 Single neuron modelshirokazutanaka
 
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...hirokazutanaka
 

Viewers also liked (20)

2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
 
2014 spark with elastic search
2014   spark with elastic search2014   spark with elastic search
2014 spark with elastic search
 
Spark Summit EU talk by Tug Grall
Spark Summit EU talk by Tug GrallSpark Summit EU talk by Tug Grall
Spark Summit EU talk by Tug Grall
 
Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
Spark Summit EU talk by Shaun Klopfenstein and Neelesh ShastrySpark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
Sigorta sektöründe tahmin modellemesi uygulama alan örnekleri
Sigorta sektöründe tahmin modellemesi uygulama alan örnekleriSigorta sektöründe tahmin modellemesi uygulama alan örnekleri
Sigorta sektöründe tahmin modellemesi uygulama alan örnekleri
 
Eu versus turkey for mtpl application
Eu versus turkey for mtpl applicationEu versus turkey for mtpl application
Eu versus turkey for mtpl application
 
Elasticsearch PHP UG BG
Elasticsearch PHP UG BGElasticsearch PHP UG BG
Elasticsearch PHP UG BG
 
Distance Learning
Distance LearningDistance Learning
Distance Learning
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
 
A Non Linear Model to explain persons with Stroke
A Non Linear Model to explain persons with StrokeA Non Linear Model to explain persons with Stroke
A Non Linear Model to explain persons with Stroke
 
Gradient Dynamical Systems, Bifurcation Theory, Numerical Methods and Applica...
Gradient Dynamical Systems, Bifurcation Theory, Numerical Methods and Applica...Gradient Dynamical Systems, Bifurcation Theory, Numerical Methods and Applica...
Gradient Dynamical Systems, Bifurcation Theory, Numerical Methods and Applica...
 
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
 
Computational Motor Control: Reinforcement Learning (JAIST summer course)
Computational Motor Control: Reinforcement Learning (JAIST summer course) Computational Motor Control: Reinforcement Learning (JAIST summer course)
Computational Motor Control: Reinforcement Learning (JAIST summer course)
 
Migraine: A dynamical disease
Migraine: A dynamical diseaseMigraine: A dynamical disease
Migraine: A dynamical disease
 
Recurrence Quantification Analysis : Tutorial & application to eye-movement data
Recurrence Quantification Analysis :Tutorial & application to eye-movement dataRecurrence Quantification Analysis :Tutorial & application to eye-movement data
Recurrence Quantification Analysis : Tutorial & application to eye-movement data
 
Computational Motor Control: Optimal Control for Stochastic Systems (JAIST su...
Computational Motor Control: Optimal Control for Stochastic Systems (JAIST su...Computational Motor Control: Optimal Control for Stochastic Systems (JAIST su...
Computational Motor Control: Optimal Control for Stochastic Systems (JAIST su...
 
JAISTサマースクール2016「脳を知るための理論」講義01 Single neuron models
JAISTサマースクール2016「脳を知るための理論」講義01 Single neuron modelsJAISTサマースクール2016「脳を知るための理論」講義01 Single neuron models
JAISTサマースクール2016「脳を知るための理論」講義01 Single neuron models
 
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
 

Similar to 2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at Spark Summit 2015

Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2Mohit Garg
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML ConferenceDB Tsai
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengSpark Summit
 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Wuhyun Rico Shin
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
data clean.ppt
data clean.pptdata clean.ppt
data clean.pptchatbot9
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
2014-08-14 Alpine Innovation to Spark
2014-08-14 Alpine Innovation to Spark2014-08-14 Alpine Innovation to Spark
2014-08-14 Alpine Innovation to SparkDB Tsai
 
Applications of machine learning in Wireless sensor networks.
Applications of machine learning in Wireless sensor networks.Applications of machine learning in Wireless sensor networks.
Applications of machine learning in Wireless sensor networks.Sahana B S
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingAdam Doyle
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkSpark Summit
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Spark Summit
 
Sparse Data Support in MLlib
Sparse Data Support in MLlibSparse Data Support in MLlib
Sparse Data Support in MLlibXiangrui Meng
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection aftab alam
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingIRJET Journal
 
Revolutionise your Machine Learning Workflow using Scikit-Learn Pipelines
Revolutionise your Machine Learning Workflow using Scikit-Learn PipelinesRevolutionise your Machine Learning Workflow using Scikit-Learn Pipelines
Revolutionise your Machine Learning Workflow using Scikit-Learn PipelinesPhilip Goddard
 
Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - SlidesAditya Joshi
 
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Vijay Srinivas Agneeswaran, Ph.D
 

Similar to 2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at Spark Summit 2015 (20)

Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
data clean.ppt
data clean.pptdata clean.ppt
data clean.ppt
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
2014-08-14 Alpine Innovation to Spark
2014-08-14 Alpine Innovation to Spark2014-08-14 Alpine Innovation to Spark
2014-08-14 Alpine Innovation to Spark
 
Applications of machine learning in Wireless sensor networks.
Applications of machine learning in Wireless sensor networks.Applications of machine learning in Wireless sensor networks.
Applications of machine learning in Wireless sensor networks.
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
 
Sparse Data Support in MLlib
Sparse Data Support in MLlibSparse Data Support in MLlib
Sparse Data Support in MLlib
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
 
Supervised Learning.pptx
Supervised Learning.pptxSupervised Learning.pptx
Supervised Learning.pptx
 
Revolutionise your Machine Learning Workflow using Scikit-Learn Pipelines
Revolutionise your Machine Learning Workflow using Scikit-Learn PipelinesRevolutionise your Machine Learning Workflow using Scikit-Learn Pipelines
Revolutionise your Machine Learning Workflow using Scikit-Learn Pipelines
 
Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - Slides
 
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
 

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at Spark Summit 2015

  • 1. Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion
  • 2. Outline –  Introduction –  Linear / Nonlinear Classification –  Feature Engineering - Polynomial Expansion –  Big-data Elastic-Net Regularized Linear Models
  • 3. Introduction –  Classification is an important and common problem •  Churn analysis, fraud detection, etc….even product recommendations –  Many observations and variables, non-linear relationships –  Non-linear and non-parametric models are popular solutions, but they are slow and difficult to interpret –  Our solution •  Automated feature generation with polynomial mappings •  Regularized regressions with various performance optimizations
  • 4.
  • 5. –  Linear : In the data’s original input space, labels can be classified by a linear decision boundary. –  Nonlinear : The classifiers have nonlinear, and possibly discontinuous decision boundaries. Linear / Nonlinear Classification
  • 6. Linear Classifier Examples –  Logistic Regression –  Support Vector Machine –  Naive Bayes Classifier –  Linear Discriminant Analysis
  • 7. Nonlinear Classifier Examples –  Kernel Support Vector Machine –  Multi-Layer Neural Networks –  Decision Tree / Random Forest –  Gradient Boosted Decision Trees –  K-nearest Neighbors Algorithm
  • 8. Feature Engineering Decision Boundary in Transformed Space Decision Boundary in Original Space
  • 10. Low-Degree Polynomial Mappings –  2nd Order Example: –  The dimension of d-degree polynomial mappings –  C.J. Lin, et al., Training and Testing Low-degree Polynomial Data Mappings via Linear SVM, JMLR, 2010
  • 11. 2-Degree Polynomial Mapping –  2-Degree Polynomial Mapping: # of features = O(n2) for one training sample –  2-Degree Polynomial Kernel Method: # of features = O(nl) for one training sample –  n is the dimension of original training sample, l is the number of training samples. –  In typical setting, l >> n. –  For sparse data, n is the average # non-zeros, O(n2) << O(n2) ; O(n2) << O(nl)
  • 12. Kernel Methods vs Polynomial Mapping
  • 13. Cover’s Theorem A complex pattern-classification problem, cast in a high-dimensional space nonlinearly, is more likely to be linearly separable than in a low-dimensional space, provided that the space is not densely populated. — Cover, T.M. , Geometrical and Statistical properties of systems of linear inequalities with applications in pattern recognition., 1965
  • 14. Logistic Regression & Overfitting –  Given a decision boundary, or a hyperplane ax1 + bx2 + c = 0 –  Distance from a sample to hyperplane Same classification result but more sensitive probability z z z
  • 15. Finding Hyperplane –  Maximum Likelihood Estimation: From a training dataset –  We want to find that maximizes the likelihood of data –  With linear separable dataset, likelihood can always be increased with the same hyperplane by multiplying a constant into weights which resulting steeper curve in logistic function. –  This can be addressed by regularization to reduce model complexity which increases the accuracy of prediction on unseen data.
  • 16. Training Logistic Regression –  Converting the product to summation by taking the natural logarithm of likelihood will be more convenient to work with. –  The negative log-likelihood will be our loss function
  • 17. Regularization –  The loss function becomes –  The loss function of regularization doesn’t depend on data. –  Common regularizations are - L2 Regularization: - L1 Regularization: - Elastic-Net Regularization:
  • 18. Geometric Interpretation –  The ellipses indicate the posterior distribution for no regularization. –  The solid areas show the constraints due to regularization. –  The corners of the L1 regularization create more opportunities for the solution to have zeros for some of the weights.
  • 19. Intuitive Interpretation –  L2 penalizes the square of weights resulting very strong “force” pushing down big weights into tiny ones. For small weights, the “force” will be very small. –  L1 penalizes their absolute value resulting smaller “force” compared with L2 when weights are large. For smaller weights, the “force” will be stronger than L2 which drives small weights to zero. –  Combining L1 and L2 penalties are called Elastic-Net method which tends to give a result in between.
  • 20. Optimization –  We want to minimize loss function –  First Order Minimizer - require loss, gradient vector of loss •  Gradient Descent is learning rate •  L-BFGS (Limited-memory BFGS) •  OWLQN (Orthant-Wise Limited-memory Quasi-Newton) for L1 •  Coordinate Descent –  Second Order Minimizer - require loss, gradient, hessian matrix of loss •  Newton-Raphson, quadratic convergence which is fast! Ref: Journal of Machine Learning Research 11 (2010) 3183-3234, Chih-Jen Lin et al.
  • 21. Issue of Second Order Minimizer –  Scale horizontally (the numbers of training data) by leveraging on Spark to parallelize this iterative optimization process. –  Don't scale vertically (the numbers of training features). –  Dimension of Hessian Matrix: dim(H) = n2 –  Recent applications from document classification and computational linguistics are of this type.
  • 22. Apache Spark Logistic Regression –  The total loss and total gradient have two part; model part depends on data while regularization part doesn’t depend on data. –  The loss and gradient of each sample is independent.
  • 23. Apache Spark Logistic Regression –  Compute the loss and gradient in parallel in executors/ workers; reduce them to get the lossSum and gradientSum in driver/controller. –  Since regularization doesn’t depend on data, the loss and gradient sum are added after distributed computation in driver. –  Optimization is done in single machine in driver; L1 regularization is handled by OWLQN optimizer.
  • 24. Apache Spark Logistic Regression Broadcast Weights to Executors Driver/Controller Executors/Workers Loop until converge Initialize Weights Compute loss and gradient for each sample, and sum them up locally Final Model Weights Compute loss and gradient for each sample, and sum them up locally Compute loss and gradient for each sample, and sum them up locally Reduce from executors to get lossSum and gradientSum Handle regularization and use LBFGS/ OWLQN to find next step Driver/Controller
  • 25. Apache Spark Linear Models –  [SPARK-5253] Linear Regression with Elastic Net (L1/L2) [SPARK-7262] Binary Logistic Regression with Elastic Net •  Author: DB Tsai, merged in Spark 1.4 •  Internally handle feature scaling to improve convergence and avoid penalizing too much on those features with low variances •  Solutions exactly match R’s glmnet but with scalability •  For LiR, the intercept is computed using close form like R •  For LoR, clever initial weights are used for faster convergence –  [SPARK-5894] Feature Polynomial Mapping •  Author: Xusen Yin, merged in Spark 1.4
  • 29. Polynomial Mapping Experiment –  New Spark ML Pipeline APIs allows us to construct the experiment very easily. –  StringIndexer for converting a string of labels into label indices used in algorithms. –  PolynomialExpansion for mapping the features into high dimensional space. –  LogisticRegression for training large scale Logistic Regression with L1/L2 Elastic-Net regularization.
  • 30.
  • 31. Datasets –  a9a, ijcnn1, and webspam datasets are used in the experiment.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. Comparison Test Accuracy Linear SVM Linear SVM Degree-2 Polynomial SVM RBF Kernel Logistic Regression Logistic Regression Degree-2 Polynomial a9a 84.98 85.06 85.03 85.0 85.26 ijcnn1 92.21 97.84 98.69 92.0 97.74 webspam 93.15 98.44 99.20 92.76 98.57 –  The results of Linear and Kernel SVM experiment are from C.J. Lin, et al., Training and Testing Low-degree Polynomial Data Mappings via Linear SVM, JMLR, 2010
  • 39. Conclusion –  For some problems, linear methods with feature engineering are as good as nonlinear kernel methods. –  However, the training and scoring are much faster for linear methods. –  For problems of document classification with sparsity, or high dimensional classification, linear methods usually perform well. –  With Elastic-Net, sparse models get be trained, and the models are easier to interpret.