SlideShare a Scribd company logo
1 of 15
Geometric and
Topological Extensions
of Regression Models
Colleen M. Farrelly
Background
Introduction
 Real data is messy.
 Large volumes
 Small volumes
 More predictors than individuals
 Missing data
 Correlated predictors
 The messiness of data can create computational
issues for algorithms based on linear algebra
solvers.
 Least squares algorithm
 Principle components algorithm
 Introducing solvers based on topology and
geometry can mitigate some of these issues and
produce robust algorithms.
Generalized Linear Models
 Flexible extensions of multiple
regression (Gaussian distribution)
common in data science today:
 Yes/no outcomes (binomial distribution)
 Count outcomes (Poisson distribution)
 Survival models (Weibull distribution)
 Transforms regression equation to fit
the outcome distribution
 Sort of like silly putty stretching the
outcome variable in the data space
 Suffers same drawbacks as multiple
regression:
 P>n
 Correlations between predictors
 Local optima
 Impose penalties on the generalized linear
model frameworks:
 Sparsity (set most estimates to 0 to reduce
model size and complexity)
 Robustness (generalizability of the results
under noise)
 Reduce the number of predictors
 Shrink some predictor estimates to 0
 Examine sets of similar predictors
 Similar to a cowboy at the origin roping
coefficients that get too close
 Includes LASSO, LARS, elastic net, and
ridge regression, among others
Penalized Regression Models
Homotopy-Based LASSO (lasso2)
 Homotopy arrow example
◦ Red and blue arrows
 Anchor start and finish points
 Wiggle middle parts of the line until
arrows overlap
◦ Yellow arrow
 Hole presents issues
 Can’t wiggle into blue or red arrow
without breaking the yellow arrow
 Homotopy LASSO/LARS wiggles an
easy regression path into an
optimal regression path
◦ Avoids local optima
 Peaks
 Valleys
 Saddles
 R package lasso2 implements for a
variety of outcome types
 Homotopy as path equivalence
◦ Intrinsic property of topological
spaces
 Instead of fitting model to data, fit model to tangent space (what isn’t
the data)
 Deals with collinearity, as parallel vectors share the same tangent space
 LARS/LASSO extensions
 Partition model into sets of predictors based on tangent space
 Fit sets that correspond well to an outcome
 Rao scoring for selection.
 Effect estimates (angles)
 Model selection criteria
 Information criteria
 Deviance scoring
 New extensions of R package dglars
 Most exponential family distributions
 Binomial
 Poisson
 Gaussian
 Gamma
Differential Geometry and Regression (dglars)
Applications in R
Example Dataset (Open-Source)
 Link to code and data:
 https://www.researchgate.net/project/Miami-Data-Science-Meetup
 https://archive.ics.uci.edu/ml/datasets/Student+Performance (original downloaded data)
 Code:
#load data
mydata<-read.csv("MathScores.csv")
#retrieve only first term scores
mydata<-mydata[,-c(32:33)]
#split to train and test set
s<-sample(1:395,0.7*395)
train<-mydata[s,]
test<-mydata[-s,]
lasso2 Package
 R package implementing homotopy-based LASSO model
 Example pieces of code for logistic regression:
library(lasso2)
#run the model, can use multiple bounds and compare fit
etastart<-NULL
las<-gl1ce(G1~., train, family=gaussian(link=identity), bound=5, standardize=F)
#predict scores of test group
lpred<-predict(las, test, link="response")
sum((lpred-test$G1)^2)/119
#compare to MSE of mean model
sum((mean(test$G1)-test$G1)^2)/119
#obtain coefficients
coef(las)
#obtain deviance estimate (model fit—can be used to derive AIC/BIC)
deviance(las)
 Try it out on your dataset!
dglars Package
 R package implementing differential-geometry-based LARS algorithm
 Example pieces of code for logistic regression:
library(dglars)
dg<-dglars(G1~., family="gaussian", data=train)
#can also use cross-validation (cvdglars() function)
dg2<-cvdglars(G1~., family="gaussian", data=train)
#summary of the model
summary(dg)
#extract coefficients from matrix of coefficients at each step
coef(dg)
#obtain model fit statistics, can also use logLik(dg)
AIC(dg)
AIC(dg2)
#plot path of LARS algorithm or model fit for cross-validated model
plot(dg)
plot(dg2)
 Try it out on your dataset!
Compare with multiple linear regression
#compare DGLARS with multiple linear regression
gl<-lm(G1~., data=train)
AIC(gl) #1418
AIC(dg) #1402
AIC(dg2) #1403
#obtain coefficients to compare with both penalized models
summary(gl)
#Compare prediction accuracy
pred<-predict(gl, test, link="response")
sum((pred-test$G1)^2)/119
sum((lpred-test$G1)^2)/119
sum((mean(test$G1)-test$G1)^2)/119
Conclusions and References
Summary
 Geometry and topology can be leveraged to improve generalized linear
regression and penalized regression model performance, particularly when
data suffers from general “messiness.”
 Multiple R packages exist to implement these algorithms, and algorithms are
built to accommodate many common exponential family distributions of
outcomes.
 Packages provide interpretable models similar to generalized linear
regression, model fit statistics, and prediction capabilities.
 Many more extensions of regression are possible, and there is work being done
to modify other algorithms based on topology and differential geometry.
Open-Source References
 Augugliaro, L., & Mineo, A. (2013, September). Estimation of sparse
generalized linear models: the dglars package. In 9th Scientific Meeting of
the Classification and Data Analysis Group (pp. 20-23). Tommaso Minerva,
Isabella Morlini, Francesco Palumbo.
 Farrelly, C. M. (2017). Topology and Geometry in Machine Learning for Logistic
Regression.
 Lokhorst, J., Venables, B., Turlach, B., & Turlach, M. B. (2013). Package
‘lasso2’.
 Osborne, M. R., Presnell, B., & Turlach, B. A. (2000). A new approach to
variable selection in least squares problems. IMA journal of numerical
analysis, 20(3), 389-403.
 R package tutorials:
 https://cran.r-project.org/web/packages/dglars/dglars.pdf
 https://cran.r-project.org/web/packages/lasso2/lasso2.pdf

More Related Content

What's hot

Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
Shrey Nishchal
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlation
domsr
 

What's hot (20)

Logistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsLogistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerations
 
Morse-Smale Regression
Morse-Smale RegressionMorse-Smale Regression
Morse-Smale Regression
 
Machine Learning by Analogy II
Machine Learning by Analogy IIMachine Learning by Analogy II
Machine Learning by Analogy II
 
Quantum persistent k cores for community detection
Quantum persistent k cores for community detectionQuantum persistent k cores for community detection
Quantum persistent k cores for community detection
 
Machine Learning by Analogy
Machine Learning by AnalogyMachine Learning by Analogy
Machine Learning by Analogy
 
Topology for data science
Topology for data scienceTopology for data science
Topology for data science
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problems
 
Empirical Network Classification
Empirical Network ClassificationEmpirical Network Classification
Empirical Network Classification
 
Multiscale Mapper Networks
Multiscale Mapper NetworksMultiscale Mapper Networks
Multiscale Mapper Networks
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 
Cluster analysis for market segmentation
Cluster analysis for market segmentationCluster analysis for market segmentation
Cluster analysis for market segmentation
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Marketing analytics - clustering Types
Marketing analytics - clustering TypesMarketing analytics - clustering Types
Marketing analytics - clustering Types
 
Data Reduction Stratergies
Data Reduction StratergiesData Reduction Stratergies
Data Reduction Stratergies
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
 
Canonical correlation
Canonical correlationCanonical correlation
Canonical correlation
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Summary2 (1)
Summary2 (1)Summary2 (1)
Summary2 (1)
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlation
 

Similar to Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models

Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
templedf
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
Ali T. Lotia
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
Revolution Analytics
 
Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret language
javed khan
 

Similar to Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models (20)

Regression kriging
Regression krigingRegression kriging
Regression kriging
 
Demography 7263 fall 2015 spatially autoregressive models 2
Demography 7263 fall 2015 spatially autoregressive models 2Demography 7263 fall 2015 spatially autoregressive models 2
Demography 7263 fall 2015 spatially autoregressive models 2
 
Colombo14a
Colombo14aColombo14a
Colombo14a
 
User biglm
User biglmUser biglm
User biglm
 
Relaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked dataRelaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked data
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
Big Data Processing using a AWS Dataset
Big Data Processing using a AWS DatasetBig Data Processing using a AWS Dataset
Big Data Processing using a AWS Dataset
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
 
Accounting for uncertainty in species delineation during the analysis of envi...
Accounting for uncertainty in species delineation during the analysis of envi...Accounting for uncertainty in species delineation during the analysis of envi...
Accounting for uncertainty in species delineation during the analysis of envi...
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
result analysis for deep leakage from gradients
result analysis for deep leakage from gradientsresult analysis for deep leakage from gradients
result analysis for deep leakage from gradients
 
Jgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging componentJgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging component
 
A course work on R programming for basics to advance statistics and GIS.pdf
A course work on R programming for basics to advance statistics and GIS.pdfA course work on R programming for basics to advance statistics and GIS.pdf
A course work on R programming for basics to advance statistics and GIS.pdf
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Surrogate modeling for industrial design
Surrogate modeling for industrial designSurrogate modeling for industrial design
Surrogate modeling for industrial design
 
Variable selection for classification and regression using R
Variable selection for classification and regression using RVariable selection for classification and regression using R
Variable selection for classification and regression using R
 
Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret language
 

More from Colleen Farrelly

More from Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceWIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network Science
 

Recently uploaded

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 

Recently uploaded (20)

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models

  • 1. Geometric and Topological Extensions of Regression Models Colleen M. Farrelly
  • 3. Introduction  Real data is messy.  Large volumes  Small volumes  More predictors than individuals  Missing data  Correlated predictors  The messiness of data can create computational issues for algorithms based on linear algebra solvers.  Least squares algorithm  Principle components algorithm  Introducing solvers based on topology and geometry can mitigate some of these issues and produce robust algorithms.
  • 4. Generalized Linear Models  Flexible extensions of multiple regression (Gaussian distribution) common in data science today:  Yes/no outcomes (binomial distribution)  Count outcomes (Poisson distribution)  Survival models (Weibull distribution)  Transforms regression equation to fit the outcome distribution  Sort of like silly putty stretching the outcome variable in the data space  Suffers same drawbacks as multiple regression:  P>n  Correlations between predictors  Local optima
  • 5.  Impose penalties on the generalized linear model frameworks:  Sparsity (set most estimates to 0 to reduce model size and complexity)  Robustness (generalizability of the results under noise)  Reduce the number of predictors  Shrink some predictor estimates to 0  Examine sets of similar predictors  Similar to a cowboy at the origin roping coefficients that get too close  Includes LASSO, LARS, elastic net, and ridge regression, among others Penalized Regression Models
  • 6. Homotopy-Based LASSO (lasso2)  Homotopy arrow example ◦ Red and blue arrows  Anchor start and finish points  Wiggle middle parts of the line until arrows overlap ◦ Yellow arrow  Hole presents issues  Can’t wiggle into blue or red arrow without breaking the yellow arrow  Homotopy LASSO/LARS wiggles an easy regression path into an optimal regression path ◦ Avoids local optima  Peaks  Valleys  Saddles  R package lasso2 implements for a variety of outcome types  Homotopy as path equivalence ◦ Intrinsic property of topological spaces
  • 7.  Instead of fitting model to data, fit model to tangent space (what isn’t the data)  Deals with collinearity, as parallel vectors share the same tangent space  LARS/LASSO extensions  Partition model into sets of predictors based on tangent space  Fit sets that correspond well to an outcome  Rao scoring for selection.  Effect estimates (angles)  Model selection criteria  Information criteria  Deviance scoring  New extensions of R package dglars  Most exponential family distributions  Binomial  Poisson  Gaussian  Gamma Differential Geometry and Regression (dglars)
  • 9. Example Dataset (Open-Source)  Link to code and data:  https://www.researchgate.net/project/Miami-Data-Science-Meetup  https://archive.ics.uci.edu/ml/datasets/Student+Performance (original downloaded data)  Code: #load data mydata<-read.csv("MathScores.csv") #retrieve only first term scores mydata<-mydata[,-c(32:33)] #split to train and test set s<-sample(1:395,0.7*395) train<-mydata[s,] test<-mydata[-s,]
  • 10. lasso2 Package  R package implementing homotopy-based LASSO model  Example pieces of code for logistic regression: library(lasso2) #run the model, can use multiple bounds and compare fit etastart<-NULL las<-gl1ce(G1~., train, family=gaussian(link=identity), bound=5, standardize=F) #predict scores of test group lpred<-predict(las, test, link="response") sum((lpred-test$G1)^2)/119 #compare to MSE of mean model sum((mean(test$G1)-test$G1)^2)/119 #obtain coefficients coef(las) #obtain deviance estimate (model fit—can be used to derive AIC/BIC) deviance(las)  Try it out on your dataset!
  • 11. dglars Package  R package implementing differential-geometry-based LARS algorithm  Example pieces of code for logistic regression: library(dglars) dg<-dglars(G1~., family="gaussian", data=train) #can also use cross-validation (cvdglars() function) dg2<-cvdglars(G1~., family="gaussian", data=train) #summary of the model summary(dg) #extract coefficients from matrix of coefficients at each step coef(dg) #obtain model fit statistics, can also use logLik(dg) AIC(dg) AIC(dg2) #plot path of LARS algorithm or model fit for cross-validated model plot(dg) plot(dg2)  Try it out on your dataset!
  • 12. Compare with multiple linear regression #compare DGLARS with multiple linear regression gl<-lm(G1~., data=train) AIC(gl) #1418 AIC(dg) #1402 AIC(dg2) #1403 #obtain coefficients to compare with both penalized models summary(gl) #Compare prediction accuracy pred<-predict(gl, test, link="response") sum((pred-test$G1)^2)/119 sum((lpred-test$G1)^2)/119 sum((mean(test$G1)-test$G1)^2)/119
  • 14. Summary  Geometry and topology can be leveraged to improve generalized linear regression and penalized regression model performance, particularly when data suffers from general “messiness.”  Multiple R packages exist to implement these algorithms, and algorithms are built to accommodate many common exponential family distributions of outcomes.  Packages provide interpretable models similar to generalized linear regression, model fit statistics, and prediction capabilities.  Many more extensions of regression are possible, and there is work being done to modify other algorithms based on topology and differential geometry.
  • 15. Open-Source References  Augugliaro, L., & Mineo, A. (2013, September). Estimation of sparse generalized linear models: the dglars package. In 9th Scientific Meeting of the Classification and Data Analysis Group (pp. 20-23). Tommaso Minerva, Isabella Morlini, Francesco Palumbo.  Farrelly, C. M. (2017). Topology and Geometry in Machine Learning for Logistic Regression.  Lokhorst, J., Venables, B., Turlach, B., & Turlach, M. B. (2013). Package ‘lasso2’.  Osborne, M. R., Presnell, B., & Turlach, B. A. (2000). A new approach to variable selection in least squares problems. IMA journal of numerical analysis, 20(3), 389-403.  R package tutorials:  https://cran.r-project.org/web/packages/dglars/dglars.pdf  https://cran.r-project.org/web/packages/lasso2/lasso2.pdf

Editor's Notes

  1. Same assumptions as multiple regression, minus outcome’s normal distribution (link function extends to non-normal distributions). McCullagh, P. (1984). Generalized linear models. European Journal of Operational Research, 16(3), 285-292.
  2. Relaxes predictor independence requirement and adds penalty term. Adds a penalty to reduce generalized linear model’s model size. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320.
  3. Exists for several types of models, including survival, binomial, and Poisson regression models Augugliaro, L., Mineo, A. M., & Wit, E. C. (2013). Differential geometric least angle regression: a differential geometric approach to sparse generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(3), 471-498. Augugliaro, L., & Mineo, A. M. (2015). Using the dglars Package to Estimate a Sparse Generalized Linear Model. In Advances in Statistical Models for Data Analysis (pp. 1-8). Springer International Publishing.