SlideShare a Scribd company logo
1 of 18
Visualizing High
Dimensional Data with
Manifold Learning in R
BY COLLEEN M. FARRELLY, DATA SCIENTIST AT GRAHAM HOLDINGS
(KAPLAN HIGHER AND PROFESSIONAL EDUCATION)
My Path to Data Science
Former MD/PhD student who started doing research/attending workshops in geometry,
topology, and machine learning
Switched degree programs into biostatistics with a topology-based slant
Have worked in biotechnology, military, education, and the social sciences
Currently on the business side of running a university, with a lot of financial modeling and risk
modeling
Mining for Data Relationships
Exploratory analysis
 Important step in data science
projects
 Trend/covariance visualization
 Clustering
 Powerful combination for
understanding many types of
problems
Types of data problems
 Time series analyses
 Predictive analyses
 Network analyses
9
3
13
5
1
7
8
14
10
11
12
6
16
15
17
2
4
0204060
Intelligence and Achievement Dendrogram
hclust (*, "complete")
dist(mydata[, 2:4])
Height
Unique subgroup identified
Time Series and Financial Data
Key tasks in time
series/financial data
analyses:
 Forecasting future time
points
 Identifying drivers of the
dynamic process (ex. why
are sales rising?)
 Identifying tipping points
(crashes, spikes…)
 Identifying covarying
behavior (sectors that
behave similarly, stocks that
influence each other, daily
rising/falling patterns…)
Dow Jones Industrial Average
Morse-Smale Clustering
Multivariate technique from topology
similar to mode clustering
 Find peaks and valleys in data by filtering
on a defined function:
 A watershed on mountains
 Dribbling a soccer ball across a field of hills
 Separate data based on shared peaks
and valleys
 Many nice developments on
convergence and theoretical properties
R package has nice dimensionality
reduction plots to highlight cluster
differences with respect to the filter
function and predictor sets
5
Dimensionality Reduction and Visualization
Helpful in visualizing multivariate trends and group
differences, particularly for multivariate time series
data
Assume data lies in a lower-dimensional subspace and
map full dataset to that subspace (right)
Types of methods:
 Linear (principal component analysis, or PCA)
 Nonlinear (manifold learning)
 Local (preserving neighborhood metrics like distance
between points)
 Global (preserving global characteristics like
connectedness and limits)
Manifold learning methods related to a branch of
mathematics called differential geometry
Manifold Learning Methods
Three main methods considered in this analysis:
 Multidimensional scaling (MDS)
 Global method based on distance preservation and matrix
decomposition
 Distances can be Euclidean, geodesic, Manhattan...
 Nice theoretical result relating it to PCA when best subspace is
linear
 Locally linear embedding (LLE)
 Local method based on nearest neighbor graph, weighting, and
matrix decomposition
 Related to ISOMAP and other methods
 t-distributed stochastic neighbor embedding (t-SNE)
 Local and global method based on mapping of probability
distributions and random walks
 Preserves both local and global characteristics of the original data
space
 Very strong performance on a variety of problems lately
Breast Cancer Dataset Comparison
Example Stock Market Dataset
Emerging markets
 Important for investors
 Future drivers of global trade
 Global trends
 Daily fluctuations
 Tipping points (crashes and opportunities)
This example:
 Recent Kaggle dataset of daily National Stock
Exchange of India prices from July 2003-
February 2018:
 https://www.kaggle.com/abhishekyana/nse-listed-
1384-companies-data/data
 Cleaned (nulls removed, <1%) and daily fluctuation
ranges added (7 total time series columns)
 3616 days included
Clustering Results
R package (msr)
 10 nearest
neighbors
 Persistence
level=1
 5 level splits
 Plot of group
trajectories (far
left)
4 distinct groups
 2 represent stable
trends (red, blue)
 2 represent
transition points in
market behavior
(green, aqua)
PCA Plot
R function
princomp()
with 2
components
Fits quite well
and shows
spread within
each cluster
MDS Plot
R function
cmdscale() with
2 components
and a Euclidean
distance metric
Relationships
very linear and
well-separated
globally
 Matches PCA
well
 Separates into:
1. Daily price
2. Daily
fluctuation
0 5000 10000 15000
-600-400-2000
MDS Results
Dimension 1
Dimension2
LLE Plot
R function lle()
with 2
components
and 10 nearest
neighbors (lle
package)
Separation and
fit not great
Suggests global
behavior more
important than
local for this
time series 0 1 2 3
-4-3-2-101
LLE Results
Dimension 1
Dimension2
t-SNE Plot
R package dimRed
with function
getDimRedData(),
perplexity
(smoothing) at 80,
2 components, and
tsne method
Parses out tipping
points within
growth period and
exact moments of
transitional events
(see green group)
-30 -20 -10 0 10 20 30 40
-30-20-100102030
tSNE Results
Dimension 1
Dimension2
Deep Dive into MDS Components
MDS components separate into prices
(component 1) and fluctuation ranges
(component 2), summarized in
correlation table
Fluctuation ranges increasing as the
market gains points (left)
Original Time Series MDS Component 1 MDS Component 2
open 1.00E+00 3.25E-03
high 1.00E+00 -6.71E-03
low 1.00E+00 9.00E-03
fluctuation.range 6.84E-01 -7.06E-01
close 1.00E+00 -2.56E-03
day.range 5.14E-01 -7.47E-01
adj_close 1.00E+00 -2.41E-03
Transition Periods Deep Dive
Transition
periods
overlap with
long-term
trends
Shorter time-
to-transition
periods in
recent years
Results Overview
NSE shows exponential growth in a time period of changes
 New regulations
 Oil price drops
 Fall of inflation
Tipping points of growth
 Includes current period, starting late 2017/early 2018
 Actually predicted tumble of NSE during February of 2018 in late 2017
 Crash predicted by several economists for sometime in 2018:
 https://www.getmoneyrich.com/indian-stock-market-correction-likely-in-2017-2018/
 https://www.livemint.com/Money/pXdnLHA2r1FJfwJhFEDqjO/Stock-market-crash-Experts-divided-on-whether-theres-more.html
Fluctuations and volatility
 Increasing in past few years
 Can vary a lot during the day while starting and closing with similar values
Conclusions
Clustering and dimensionality reduction for
multivariate data exploration
 Helpful for understanding multivariate time
series data
 Helpful for understanding other types of data
prior to analysis
Performs very well, showing behavior
deviations before major events
Can provide an understanding of covariance
structure (relationships between stocks,
volatility within a market…)
References
Farrelly, C. M. (2017). Dimensionality Reduction Ensembles. arXiv preprint arXiv:1710.04484.
Gerber, S., Rübel, O., Bremer, P. T., Pascucci, V., & Whitaker, R. T. (2013). Morse–smale
regression. Journal of Computational and Graphical Statistics, 22(1), 193-214.
Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric
hypothesis. Psychometrika, 29(1), 1-27.
Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning
research, 9(Nov), 2579-2605.
Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear
embedding. science, 290(5500), 2323-2326.
Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and
intelligent laboratory systems, 2(1-3), 37-52.
ResearchGate profile with folder for talk (data, R code, PPT):
https://www.researchgate.net/profile/Colleen_Farrelly2

More Related Content

What's hot

Distributed computing
Distributed computingDistributed computing
Distributed computing
shivli0769
 

What's hot (20)

Distributed computing
Distributed computingDistributed computing
Distributed computing
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Deep learning
Deep learningDeep learning
Deep learning
 
Chapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image TransformationChapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image Transformation
 
topological features
topological featurestopological features
topological features
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Simultaneous Smoothing and Sharpening of Color Images
Simultaneous Smoothing and Sharpening of Color ImagesSimultaneous Smoothing and Sharpening of Color Images
Simultaneous Smoothing and Sharpening of Color Images
 
Chapter 5 Image Processing: Fourier Transformation
Chapter 5 Image Processing: Fourier TransformationChapter 5 Image Processing: Fourier Transformation
Chapter 5 Image Processing: Fourier Transformation
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Lecture 15 DCT, Walsh and Hadamard Transform
Lecture 15 DCT, Walsh and Hadamard TransformLecture 15 DCT, Walsh and Hadamard Transform
Lecture 15 DCT, Walsh and Hadamard Transform
 
Concept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithmConcept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithm
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Fuzzy logic and application in AI
Fuzzy logic and application in AIFuzzy logic and application in AI
Fuzzy logic and application in AI
 
20CS2021 DISTRIBUTED COMPUTING
20CS2021 DISTRIBUTED COMPUTING20CS2021 DISTRIBUTED COMPUTING
20CS2021 DISTRIBUTED COMPUTING
 
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
 

Similar to High-Dimensional Data Visualization, Geometry, and Stock Market Crashes

Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...
IJDKP
 
Predictive geospatial analytics using principal component regression
Predictive geospatial analytics using principal component regression Predictive geospatial analytics using principal component regression
Predictive geospatial analytics using principal component regression
IJECEIAES
 
Module 04 Content· As a continuation to examining your policies, r
Module 04 Content· As a continuation to examining your policies, rModule 04 Content· As a continuation to examining your policies, r
Module 04 Content· As a continuation to examining your policies, r
IlonaThornburg83
 

Similar to High-Dimensional Data Visualization, Geometry, and Stock Market Crashes (20)

KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research Report
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data Processing
 
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
 
Towards reducing the
Towards reducing theTowards reducing the
Towards reducing the
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...
 
On multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and queryingOn multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and querying
 
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Using R for Classification of Large Social Network Data
Using R for Classification of Large Social Network DataUsing R for Classification of Large Social Network Data
Using R for Classification of Large Social Network Data
 
Role of Modern Geographical Knowledge in National Development
Role  of Modern Geographical Knowledge in National DevelopmentRole  of Modern Geographical Knowledge in National Development
Role of Modern Geographical Knowledge in National Development
 
High dimensionality reduction on graphical data
High dimensionality reduction on graphical dataHigh dimensionality reduction on graphical data
High dimensionality reduction on graphical data
 
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
 
Predictive geospatial analytics using principal component regression
Predictive geospatial analytics using principal component regression Predictive geospatial analytics using principal component regression
Predictive geospatial analytics using principal component regression
 
Module 04 Content· As a continuation to examining your policies, r
Module 04 Content· As a continuation to examining your policies, rModule 04 Content· As a continuation to examining your policies, r
Module 04 Content· As a continuation to examining your policies, r
 
Lesson 6 chapter 4
Lesson 6   chapter 4Lesson 6   chapter 4
Lesson 6 chapter 4
 
A hybrid approach for analysis of dynamic changes in spatial data
A hybrid approach for analysis of dynamic changes in spatial dataA hybrid approach for analysis of dynamic changes in spatial data
A hybrid approach for analysis of dynamic changes in spatial data
 
671_JeevanRavula_CEE
671_JeevanRavula_CEE671_JeevanRavula_CEE
671_JeevanRavula_CEE
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 

More from Colleen Farrelly

More from Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 

Recently uploaded

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 

Recently uploaded (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

High-Dimensional Data Visualization, Geometry, and Stock Market Crashes

  • 1. Visualizing High Dimensional Data with Manifold Learning in R BY COLLEEN M. FARRELLY, DATA SCIENTIST AT GRAHAM HOLDINGS (KAPLAN HIGHER AND PROFESSIONAL EDUCATION)
  • 2. My Path to Data Science Former MD/PhD student who started doing research/attending workshops in geometry, topology, and machine learning Switched degree programs into biostatistics with a topology-based slant Have worked in biotechnology, military, education, and the social sciences Currently on the business side of running a university, with a lot of financial modeling and risk modeling
  • 3. Mining for Data Relationships Exploratory analysis  Important step in data science projects  Trend/covariance visualization  Clustering  Powerful combination for understanding many types of problems Types of data problems  Time series analyses  Predictive analyses  Network analyses 9 3 13 5 1 7 8 14 10 11 12 6 16 15 17 2 4 0204060 Intelligence and Achievement Dendrogram hclust (*, "complete") dist(mydata[, 2:4]) Height Unique subgroup identified
  • 4. Time Series and Financial Data Key tasks in time series/financial data analyses:  Forecasting future time points  Identifying drivers of the dynamic process (ex. why are sales rising?)  Identifying tipping points (crashes, spikes…)  Identifying covarying behavior (sectors that behave similarly, stocks that influence each other, daily rising/falling patterns…) Dow Jones Industrial Average
  • 5. Morse-Smale Clustering Multivariate technique from topology similar to mode clustering  Find peaks and valleys in data by filtering on a defined function:  A watershed on mountains  Dribbling a soccer ball across a field of hills  Separate data based on shared peaks and valleys  Many nice developments on convergence and theoretical properties R package has nice dimensionality reduction plots to highlight cluster differences with respect to the filter function and predictor sets 5
  • 6. Dimensionality Reduction and Visualization Helpful in visualizing multivariate trends and group differences, particularly for multivariate time series data Assume data lies in a lower-dimensional subspace and map full dataset to that subspace (right) Types of methods:  Linear (principal component analysis, or PCA)  Nonlinear (manifold learning)  Local (preserving neighborhood metrics like distance between points)  Global (preserving global characteristics like connectedness and limits) Manifold learning methods related to a branch of mathematics called differential geometry
  • 7. Manifold Learning Methods Three main methods considered in this analysis:  Multidimensional scaling (MDS)  Global method based on distance preservation and matrix decomposition  Distances can be Euclidean, geodesic, Manhattan...  Nice theoretical result relating it to PCA when best subspace is linear  Locally linear embedding (LLE)  Local method based on nearest neighbor graph, weighting, and matrix decomposition  Related to ISOMAP and other methods  t-distributed stochastic neighbor embedding (t-SNE)  Local and global method based on mapping of probability distributions and random walks  Preserves both local and global characteristics of the original data space  Very strong performance on a variety of problems lately Breast Cancer Dataset Comparison
  • 8. Example Stock Market Dataset Emerging markets  Important for investors  Future drivers of global trade  Global trends  Daily fluctuations  Tipping points (crashes and opportunities) This example:  Recent Kaggle dataset of daily National Stock Exchange of India prices from July 2003- February 2018:  https://www.kaggle.com/abhishekyana/nse-listed- 1384-companies-data/data  Cleaned (nulls removed, <1%) and daily fluctuation ranges added (7 total time series columns)  3616 days included
  • 9. Clustering Results R package (msr)  10 nearest neighbors  Persistence level=1  5 level splits  Plot of group trajectories (far left) 4 distinct groups  2 represent stable trends (red, blue)  2 represent transition points in market behavior (green, aqua)
  • 10. PCA Plot R function princomp() with 2 components Fits quite well and shows spread within each cluster
  • 11. MDS Plot R function cmdscale() with 2 components and a Euclidean distance metric Relationships very linear and well-separated globally  Matches PCA well  Separates into: 1. Daily price 2. Daily fluctuation 0 5000 10000 15000 -600-400-2000 MDS Results Dimension 1 Dimension2
  • 12. LLE Plot R function lle() with 2 components and 10 nearest neighbors (lle package) Separation and fit not great Suggests global behavior more important than local for this time series 0 1 2 3 -4-3-2-101 LLE Results Dimension 1 Dimension2
  • 13. t-SNE Plot R package dimRed with function getDimRedData(), perplexity (smoothing) at 80, 2 components, and tsne method Parses out tipping points within growth period and exact moments of transitional events (see green group) -30 -20 -10 0 10 20 30 40 -30-20-100102030 tSNE Results Dimension 1 Dimension2
  • 14. Deep Dive into MDS Components MDS components separate into prices (component 1) and fluctuation ranges (component 2), summarized in correlation table Fluctuation ranges increasing as the market gains points (left) Original Time Series MDS Component 1 MDS Component 2 open 1.00E+00 3.25E-03 high 1.00E+00 -6.71E-03 low 1.00E+00 9.00E-03 fluctuation.range 6.84E-01 -7.06E-01 close 1.00E+00 -2.56E-03 day.range 5.14E-01 -7.47E-01 adj_close 1.00E+00 -2.41E-03
  • 15. Transition Periods Deep Dive Transition periods overlap with long-term trends Shorter time- to-transition periods in recent years
  • 16. Results Overview NSE shows exponential growth in a time period of changes  New regulations  Oil price drops  Fall of inflation Tipping points of growth  Includes current period, starting late 2017/early 2018  Actually predicted tumble of NSE during February of 2018 in late 2017  Crash predicted by several economists for sometime in 2018:  https://www.getmoneyrich.com/indian-stock-market-correction-likely-in-2017-2018/  https://www.livemint.com/Money/pXdnLHA2r1FJfwJhFEDqjO/Stock-market-crash-Experts-divided-on-whether-theres-more.html Fluctuations and volatility  Increasing in past few years  Can vary a lot during the day while starting and closing with similar values
  • 17. Conclusions Clustering and dimensionality reduction for multivariate data exploration  Helpful for understanding multivariate time series data  Helpful for understanding other types of data prior to analysis Performs very well, showing behavior deviations before major events Can provide an understanding of covariance structure (relationships between stocks, volatility within a market…)
  • 18. References Farrelly, C. M. (2017). Dimensionality Reduction Ensembles. arXiv preprint arXiv:1710.04484. Gerber, S., Rübel, O., Bremer, P. T., Pascucci, V., & Whitaker, R. T. (2013). Morse–smale regression. Journal of Computational and Graphical Statistics, 22(1), 193-214. Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1), 1-27. Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605. Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. science, 290(5500), 2323-2326. Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3), 37-52. ResearchGate profile with folder for talk (data, R code, PPT): https://www.researchgate.net/profile/Colleen_Farrelly2