SlideShare a Scribd company logo
1 of 16
Hierarchical Clustering and Topology
for Psychometric Validation
By Colleen M. Farrelly
Creating a New Survey: Psychometrics
 Many types of surveys/tests exist for assessing academic achievement,
psychological traits, or sociological constructs; the field that studies the
construction and functioning of tests is called psychometrics.
 Sometimes, a new survey must be created to either improve upon a
previous/discontinued one or assess a new idea/context for a given behavior
or trait.
 These new surveys pose several statistical challenges:
 Consistency within a survey (measuring what a survey is thought to measure)
 Crohnbach’s alpha, differential item functioning…
 Validation across samples (measures the same thing across populations/time)
 Exploratory factor analysis followed by confirmatory factor analysis
 Subscales for easier computation and interpretation of results (need to figure out
what items on the survey function similarly)
 Statistical frameworks exist for assessing these challenges, but they typically
require large sample sizes and assume certain structures underlie the survey
design.
Example Survey
a) Red:Rainbow::July:____ (Month, Year, Hot, Cloud)
b) Soothing:Anodyne::____:Esoteric (Eccentric, School, Abstruse, Calming)
c) Pyrrhic:Victory::Potemkin:____ (Village, Battle, Hollow, Achilles)
d) Stegasaurus:Jurassic::Trilobite:____ (Triassic, Dinosaur, Mesozoic, Cambrian)
e) Mice:Men::Cabbages:____ (Women, Lettuce, Salad, Kings)
f) Fill in the following series: 1, 1/8, 1/27, 1/64, ___
g) Fill in the following series: ___, 25, 168, 1229, 9592
h) Fill in the following series: 3, ___,4,1,5
Factor Analysis
 Creation of new surveys requires internal and external validation, typically
done through factor analysis.
 Exploratory factor analysis is used to cluster items measuring similar underlying
processes.
 Confirmatory factor analysis can then be applied to validate those clusters, or
subscales, that were found in the exploratory analysis.
 Crohnbach’s alpha establishes internal consistency.
Verbal
Math
f
g
h
a
b c d
e
Potential Pitfalls in Psychometric
Validation with Factor Analysis
 Two major problems challenge the assumptions of these methods and
necessitate the development of a new way to analyze and validate the
measure.
 Time-wise or context-wise measurement can introduce non-independent, non-
hierarchical components into the model.
 Study habits across terms (longitudinal effects on measurement), identity across social
spheres (student perception of intellectual ability when with friends, work, and school)
 Factor analysis can be broadened to Bayesian networks and structural equation models, but
this method comes with its own assumptions on the underlying geometry and sample size.
 Small sample size can create numerical instability in traditional algorithms for both
factor analysis and structural equation models (suggest 5-10 participants per item).
 If there are 90 items, at least 450 students would be needed to discover subscales, and
another 450 would be needed to validate these findings.
 Cost and population size can be prohibitive to the study.
 Ex. Bridging constructs, or loosely connected concepts without a defined hierarchy,
typically run into both limitations and require a new method to validate their
surveys.
 Many of these issues arise from the dependence on linear mapping from the
survey response space to a lower-dimensional space.
Moving from Euclidean-Based Statistics
to Topologically-Based Statistics
Loss of information with
each projection to a lower-
dimensional space (errors)
Topological methods work by
partitioning existing space
into homogenous components
(no maps, no error)
2D example
Algebraic Topology and Topological Spaces
 Spaces, such as the one formed by survey response data, can be defined
topologically and decomposed using algebraic topology/geometry.
 Data follows discrete versions of many theoretical results in this area of math.
 Topology is rubber sheet geometry, with areas analogous to gluing together children’s building
blocks, examining connections on shapes, or hunting for mountain/valley water flows.
 Examining how the pieces fit together in a given space allows one to study the topological
space’s defining characteristics and the behavior of functions in that space.
 One can define connections between pieces of this space via algebra and examine
structural properties computationally:
 Homotopy (shrinking connected paths to a point)
 Homology (hole-counting to define topological classification of structure)
1 2 3Homotopy/
Homology Basins of Attraction (Morse Theory)
Hodge Theory
Applied Homology: Filtrations and
Persistence
 Filtration
 This is an iterative changing of lens with which
to examine data (height, neighbors…).
 Topological features appear and disappear as
the lens changes.
 This creates a nested sequence of features with
underlying algebraic objects, called a homology
sequence:
 Hom1⊂Hom2⊂Hom3⊂Hom4
 Persistence is the length of feature existence in
a homology sequence, which can be visualized.
 This information maps back to the data
space’s topology (shape).
 The first level of algebraic objects
corresponds to connectedness of the space
(0th Betti numbers), and this is directly
related to a type of clustering analysis.
0 2 4 6 8 10
time
Connected
space
Vertices
Hole in
middle
Solution: Use Machine Learning to Exploit
Underlying Topology of Survey Data
 Single-linkage hierarchical clustering partitions data space according to
connected components (0th Betti numbers) across filtration levels (i.e. a series
of distance filtrations).
 This method has been successfully applied to neuroimaging studies focused on
patterns of brain activity across diseases, neuropsychological tests, and drug states.
 This provides a nuanced scanning of topologically-based features within the datasets at
different correlation/similarity thresholds.
 These can be summarized in feature plots, called persistence diagrams, that track the birth
and death of a given feature across thresholds, and can be compared through existing
statistical tests, such as a nonparametric Wasserstein metric test.
 It has also been used to track gene expression pattern changes across time and/or
disease states in microarray studies.
 These studies particularly emphasize the visualization of hierarchical clustering through
dendrograms (tree diagrams of relationships at different filtration levels) and heat maps
(color-coded expression-similarity plots among genes in the microarray).
 These visualizations provide a user-friendly way to understand and communicate key
findings of this statistical method.
 This method can handle data with fewer observations than predictors (p>>n), and,
thus, does not require large sample sizes.
 Internal correlations do not pose issues; in fact, the method excels at separating
data within and across dependencies.
Hierarchical Clustering: Example Survey
Math Verbal
Heatmap
Very distinct separation of items (noted
by sharp color contrast of heatmap and
long height bars on dendrogram)
Validation: Dendrograms and Topology
 Dendrograms are a special type of graph,
called a tree.
 Because graphs have a defined topological
space and dendrograms are a type of
graph, they can be studied or measured
through the tools of topology and metric
geometry.
 Hausdorff distance allows two objects of
the same dimension to be compared by a
defined metric.
 This examines the greatest distance
between close points, allowing for a
nearness-of-match type of metric on two
objects (top left).
 Within a graph framework, it allows one to
calculate worst best match between two
graphs (as shown at bottom left).
 This allows for the development of a
distance-based nonparametric test to test
for dendrogram structural differences in a
statistical framework.
Hausdorff
Distance
Steps in Exploration and Validation of
Surveys with Hierarchical Clustering
1) Partition sample into training and validation sets/draw a small number of
bootstrap samples from the original dataset.
2) Calculate distance metrics in each sample.
3) Run a single-linkage hierarchical clustering algorithm on the training set to
obtain exploratory clusters of similar survey items (pvclust R package
statistically tests internal survey structure like the Crohnbach alpha metric).
Create heat map and dendrogram.
4) Repeat (3) on validation sets to obtain a set of dendrograms.
5) Calculate Hausdorff distance (a topological metric) between dendrograms to
estimate differences in results (validation step).
6) Obtain p-value through permuting the extant dendrograms or generating
random dendrograms.
7) If p-value is larger than 0.05/n (Bonferroni correction) for dendrograms in (5),
no statistically significant differences exist in dendrogram structure, meaning
that the survey clusters are consistent and valid.
Example Measure: Bridging Constructs
 Identity expression across life contexts (ILLCQ Survey):
 There are many components to identity in leading theories of identity.
 Example: religious identity in school, family, and friends contexts
 It was unknown whether identity type or social context plays a greater role in the
expression of identity within an individual.
 Identity type as more influential would suggest that identity is a fairly static trait.
 Context as more influential would suggest that identity is fluid.
 Sample size and survey size
 406 participants (FIU students) and 91 distinct survey items.
 5 draws of 130 participants each for validation and consistency checks.
 Results suggest certain aspects of identity are fluid and others are fixed.
 Political and racial/ethnic identity are fairly fixed.
 Other types, such as athletic or gender, are fairly fluid.
 Bootstrapped samples suggest consistency of measure and validate findings.
 Subscales hold over different samples (tests of difference, all p>0.05).
 This validates the measure and allows for inference into the psychology of identity.
Identity by Context Survey HeatmapILLCa_school_success_family
ILLCa_school_success_school
ILLCa_gender_dating
ILLCa_age_dating
ILLCa_age_freetime
ILLCa_sexual_or_dating
ILLCa_beauty_dating
ILLCa_sport_dating
ILLCa_sport_freetime
ILLCa_sport_religion
ILLCa_religion_freetime
ILLCa_religion_family
ILLCa_religion_school
ILLCa_religion_neighborhood
ILLCa_politics_dating
ILLCa_religion_group
ILLCa_sexual_or_religion
ILLCa_gender_religion
ILLCa_age_religion
ILLCa_politics_religion
ILLCa_politics_family
ILLCa_politics_neighborhood
ILLCa_politics_group
ILLCa_politics_school
ILLCa_politics_freetime
ILLCa_tribe_dating
ILLCa_tribe_group
ILLCa_tribe_freetime
ILLCa_tribe_family
ILLCa_tribe_school
ILLCa_tribe_neighborhood
ILLCa_tribe_religion
ILLCa_beauty_neighborhood
ILLCa_look_neighborhood
ILLCa_school_success_religion
ILLCa_look_religion
ILLCa_music_neighborhood
ILLCa_race_religion
ILLCa_status_religion
ILLCa_beauty_religion
ILLCa_religion_religion
ILLCa_religion_dating
ILLCa_race_school
ILLCa_race_freetime
ILLCa_sexual_or_school
ILLCa_beauty_family
ILLCa_beauty_freetime
ILLCa_beauty_school
ILLCa_beauty_group
ILLCa_look_freetime
ILLCa_look_family
ILLCa_look_school
ILLCa_status_dating
ILLCa_status_group
ILLCa_race_group
ILLCa_race_dating
ILLCa_sexual_or_group
ILLCa_sexual_or_freetime
ILLCa_gender_freetime
ILLCa_gender_family
ILLCa_gender_school
ILLCa_age_family
ILLCa_age_school
ILLCa_school_success_neighborhood
ILLCa_race_neighborhood
ILLCa_sexual_or_neighborhood
ILLCa_status_neighborhood
ILLCa_gender_neighborhood
ILLCa_age_neighborhood
ILLCa_sport_school
ILLCa_sport_family
ILLCa_sport_group
ILLCa_music_freetime
ILLCa_music_religion
ILLCa_music_dating
ILLCa_sport_neighborhood
ILLCa_school_success_dating
ILLCa_school_success_group
ILLCa_school_success_freetime
ILLCa_music_school
ILLCa_music_family
ILLCa_music_group
ILLCa_gender_group
ILLCa_age_group
ILLCa_look_group
ILLCa_look_dating
ILLCa_race_family
ILLCa_sexual_or_family
ILLCa_status_freetime
ILLCa_status_family
ILLCa_status_school
-0.2
0
0.2
0.4
0.6
0.8
1
Conclusion
 This method offers a robust way to create survey subscales and validate
measures without needing a large sample or a pre-defined measure structure.
 Flexible
 Deeply routed in mathematics
 Statistically testable
 Internal validity by pvclust’s statistical test of cluster hierarchy for cut-points
 External validity by Hausdorff nonparametric test on bootstrapped samples
 It has been successfully applied to a bridging concept survey (factorial design),
as well as more traditional survey designs.
 This offers a general way to extend traditional areas of statistics to a more
general framework through the use of topological theory and tools.
 Likely to be useful as data becomes more complex in industry and academia.
 May be able to circumvent other problems in modern statistics.
 Item response theory (how people in different groups perform on test items)
 Network comparison (social networks, covariance networks…) between groups or over time
 Structural equation modeling when data does not meet method assumptions
Co-authors
 The Analysis of bridging constructs with hierarchical clustering methods: An
application to identity (under review Journal of Research in Personality)
 Seth Schwartz, University of Miami
 Anna Lisa Amodeo, University of Naples
 Daniel Feaster, University of Miami
 Douglas Steinley, University of Missouri
 Alan Meca, University of Miami
 Simona Picariello, University of Naples

More Related Content

What's hot

08 test of hypothesis large sample.ppt
08 test of hypothesis large sample.ppt08 test of hypothesis large sample.ppt
08 test of hypothesis large sample.pptPooja Sakhla
 
Data cleaning and screening
Data cleaning and screeningData cleaning and screening
Data cleaning and screeningHassan Hussein
 
Information retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomnessInformation retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomnessVaibhav Khanna
 
Estimating standard error of measurement
Estimating standard error of measurementEstimating standard error of measurement
Estimating standard error of measurementCarlo Magno
 
One sided or one-tailed tests
One sided or one-tailed testsOne sided or one-tailed tests
One sided or one-tailed testsHasnain Baber
 
Statistical analysis training course
Statistical analysis training courseStatistical analysis training course
Statistical analysis training courseMarwa Abo-Amra
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysisNimrita Koul
 
Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...
Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...
Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...Ken Kwong-Kay Wong
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression TreesHemant Chetwani
 
Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Saurab Dulal
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and ldaSuresh Pokharel
 
Descriptives & Graphing
Descriptives & GraphingDescriptives & Graphing
Descriptives & GraphingJames Neill
 
Factor Analysis in Research
Factor Analysis in ResearchFactor Analysis in Research
Factor Analysis in ResearchQasim Raza
 

What's hot (20)

MANOVA SPSS
MANOVA SPSSMANOVA SPSS
MANOVA SPSS
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Parametric vs Non-Parametric
Parametric vs Non-ParametricParametric vs Non-Parametric
Parametric vs Non-Parametric
 
Chi square test
Chi square testChi square test
Chi square test
 
08 test of hypothesis large sample.ppt
08 test of hypothesis large sample.ppt08 test of hypothesis large sample.ppt
08 test of hypothesis large sample.ppt
 
Data cleaning and screening
Data cleaning and screeningData cleaning and screening
Data cleaning and screening
 
Information retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomnessInformation retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomness
 
Estimating standard error of measurement
Estimating standard error of measurementEstimating standard error of measurement
Estimating standard error of measurement
 
One sided or one-tailed tests
One sided or one-tailed testsOne sided or one-tailed tests
One sided or one-tailed tests
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Statistical analysis training course
Statistical analysis training courseStatistical analysis training course
Statistical analysis training course
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...
Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...
Mastering Partial Least Squares Structural Equation Modeling (PLS-SEM) with S...
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Descriptives & Graphing
Descriptives & GraphingDescriptives & Graphing
Descriptives & Graphing
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Factor Analysis in Research
Factor Analysis in ResearchFactor Analysis in Research
Factor Analysis in Research
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 

Similar to Hierarchical Clustering and Topology for Psychometric Validation

The use of statistics in outcomes assessment
The use of statistics in outcomes assessmentThe use of statistics in outcomes assessment
The use of statistics in outcomes assessmentjimber0910
 
Reading Material: Qualitative Interview
Reading Material: Qualitative InterviewReading Material: Qualitative Interview
Reading Material: Qualitative Interviewfirdausabdmunir85
 
Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016
Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016
Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016Aurangzeb Ch
 
Graduate Paper--Hierarchical clustring and topology for psychometrics paper
Graduate Paper--Hierarchical clustring and topology for psychometrics paperGraduate Paper--Hierarchical clustring and topology for psychometrics paper
Graduate Paper--Hierarchical clustring and topology for psychometrics paperColleen Farrelly
 
Parts of research paper
Parts of research paperParts of research paper
Parts of research paperAllanAdem
 
An Experimental Template For Case Study Research
An Experimental Template For Case Study ResearchAn Experimental Template For Case Study Research
An Experimental Template For Case Study ResearchZaara Jensen
 
Advanced statistics for librarians
Advanced statistics for librariansAdvanced statistics for librarians
Advanced statistics for librariansJohn McDonald
 
Assigning And Combining Probabilities In Single-Case Studies
Assigning And Combining Probabilities In Single-Case StudiesAssigning And Combining Probabilities In Single-Case Studies
Assigning And Combining Probabilities In Single-Case StudiesZaara Jensen
 
Week 1-2 -INTRODUCTION TO QUANTITATIVE RESEARCH.pptx
Week 1-2 -INTRODUCTION TO QUANTITATIVE RESEARCH.pptxWeek 1-2 -INTRODUCTION TO QUANTITATIVE RESEARCH.pptx
Week 1-2 -INTRODUCTION TO QUANTITATIVE RESEARCH.pptxChristineTorrepenida1
 
Role of Modern Geographical Knowledge in National Development
Role  of Modern Geographical Knowledge in National DevelopmentRole  of Modern Geographical Knowledge in National Development
Role of Modern Geographical Knowledge in National DevelopmentProf Ashis Sarkar
 
Chapter-4-Research-Methods.pptx
Chapter-4-Research-Methods.pptxChapter-4-Research-Methods.pptx
Chapter-4-Research-Methods.pptxDulnuanCrizamae
 
Developing of climate data for building simulation with future weather condit...
Developing of climate data for building simulation with future weather condit...Developing of climate data for building simulation with future weather condit...
Developing of climate data for building simulation with future weather condit...Rasmus Madsen
 
Kinds of Quantitative Research.pptx
Kinds of Quantitative Research.pptxKinds of Quantitative Research.pptx
Kinds of Quantitative Research.pptxDanCyrelSumampong2
 
A Qualititative Approach To HCI Research
A Qualititative Approach To HCI ResearchA Qualititative Approach To HCI Research
A Qualititative Approach To HCI ResearchNathan Mathis
 
A Critique Of Anscombe S Work On Statistical Analysis Using Graphs (2013 Home...
A Critique Of Anscombe S Work On Statistical Analysis Using Graphs (2013 Home...A Critique Of Anscombe S Work On Statistical Analysis Using Graphs (2013 Home...
A Critique Of Anscombe S Work On Statistical Analysis Using Graphs (2013 Home...Simar Neasy
 
Qualitative, Quantitative and Mixed MethodThe qualitative method o.docx
Qualitative, Quantitative and Mixed MethodThe qualitative method o.docxQualitative, Quantitative and Mixed MethodThe qualitative method o.docx
Qualitative, Quantitative and Mixed MethodThe qualitative method o.docxhildredzr1di
 
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical EvidenceStatistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical EvidenceJohn McDonald
 
1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptxGeraldRefil3
 

Similar to Hierarchical Clustering and Topology for Psychometric Validation (20)

The use of statistics in outcomes assessment
The use of statistics in outcomes assessmentThe use of statistics in outcomes assessment
The use of statistics in outcomes assessment
 
Reading Material: Qualitative Interview
Reading Material: Qualitative InterviewReading Material: Qualitative Interview
Reading Material: Qualitative Interview
 
Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016
Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016
Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016
 
Graduate Paper--Hierarchical clustring and topology for psychometrics paper
Graduate Paper--Hierarchical clustring and topology for psychometrics paperGraduate Paper--Hierarchical clustring and topology for psychometrics paper
Graduate Paper--Hierarchical clustring and topology for psychometrics paper
 
Parts of research paper
Parts of research paperParts of research paper
Parts of research paper
 
An Experimental Template For Case Study Research
An Experimental Template For Case Study ResearchAn Experimental Template For Case Study Research
An Experimental Template For Case Study Research
 
Advanced statistics for librarians
Advanced statistics for librariansAdvanced statistics for librarians
Advanced statistics for librarians
 
Assigning And Combining Probabilities In Single-Case Studies
Assigning And Combining Probabilities In Single-Case StudiesAssigning And Combining Probabilities In Single-Case Studies
Assigning And Combining Probabilities In Single-Case Studies
 
Week 1-2 -INTRODUCTION TO QUANTITATIVE RESEARCH.pptx
Week 1-2 -INTRODUCTION TO QUANTITATIVE RESEARCH.pptxWeek 1-2 -INTRODUCTION TO QUANTITATIVE RESEARCH.pptx
Week 1-2 -INTRODUCTION TO QUANTITATIVE RESEARCH.pptx
 
PR 2, WEEK 2.pptx
PR 2, WEEK 2.pptxPR 2, WEEK 2.pptx
PR 2, WEEK 2.pptx
 
Role of Modern Geographical Knowledge in National Development
Role  of Modern Geographical Knowledge in National DevelopmentRole  of Modern Geographical Knowledge in National Development
Role of Modern Geographical Knowledge in National Development
 
Chapter-4-Research-Methods.pptx
Chapter-4-Research-Methods.pptxChapter-4-Research-Methods.pptx
Chapter-4-Research-Methods.pptx
 
Developing of climate data for building simulation with future weather condit...
Developing of climate data for building simulation with future weather condit...Developing of climate data for building simulation with future weather condit...
Developing of climate data for building simulation with future weather condit...
 
Kinds of Quantitative Research.pptx
Kinds of Quantitative Research.pptxKinds of Quantitative Research.pptx
Kinds of Quantitative Research.pptx
 
CMSS FIVE
CMSS FIVECMSS FIVE
CMSS FIVE
 
A Qualititative Approach To HCI Research
A Qualititative Approach To HCI ResearchA Qualititative Approach To HCI Research
A Qualititative Approach To HCI Research
 
A Critique Of Anscombe S Work On Statistical Analysis Using Graphs (2013 Home...
A Critique Of Anscombe S Work On Statistical Analysis Using Graphs (2013 Home...A Critique Of Anscombe S Work On Statistical Analysis Using Graphs (2013 Home...
A Critique Of Anscombe S Work On Statistical Analysis Using Graphs (2013 Home...
 
Qualitative, Quantitative and Mixed MethodThe qualitative method o.docx
Qualitative, Quantitative and Mixed MethodThe qualitative method o.docxQualitative, Quantitative and Mixed MethodThe qualitative method o.docx
Qualitative, Quantitative and Mixed MethodThe qualitative method o.docx
 
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical EvidenceStatistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
 
1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx
 

More from Colleen Farrelly

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptxColleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxColleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxColleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxColleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxColleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptxColleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptxColleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptxColleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxColleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptxColleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasColleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxColleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptxColleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxColleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxColleen Farrelly
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing Colleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science TalkColleen Farrelly
 

More from Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 

Recently uploaded

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

Hierarchical Clustering and Topology for Psychometric Validation

  • 1. Hierarchical Clustering and Topology for Psychometric Validation By Colleen M. Farrelly
  • 2. Creating a New Survey: Psychometrics  Many types of surveys/tests exist for assessing academic achievement, psychological traits, or sociological constructs; the field that studies the construction and functioning of tests is called psychometrics.  Sometimes, a new survey must be created to either improve upon a previous/discontinued one or assess a new idea/context for a given behavior or trait.  These new surveys pose several statistical challenges:  Consistency within a survey (measuring what a survey is thought to measure)  Crohnbach’s alpha, differential item functioning…  Validation across samples (measures the same thing across populations/time)  Exploratory factor analysis followed by confirmatory factor analysis  Subscales for easier computation and interpretation of results (need to figure out what items on the survey function similarly)  Statistical frameworks exist for assessing these challenges, but they typically require large sample sizes and assume certain structures underlie the survey design.
  • 3. Example Survey a) Red:Rainbow::July:____ (Month, Year, Hot, Cloud) b) Soothing:Anodyne::____:Esoteric (Eccentric, School, Abstruse, Calming) c) Pyrrhic:Victory::Potemkin:____ (Village, Battle, Hollow, Achilles) d) Stegasaurus:Jurassic::Trilobite:____ (Triassic, Dinosaur, Mesozoic, Cambrian) e) Mice:Men::Cabbages:____ (Women, Lettuce, Salad, Kings) f) Fill in the following series: 1, 1/8, 1/27, 1/64, ___ g) Fill in the following series: ___, 25, 168, 1229, 9592 h) Fill in the following series: 3, ___,4,1,5
  • 4. Factor Analysis  Creation of new surveys requires internal and external validation, typically done through factor analysis.  Exploratory factor analysis is used to cluster items measuring similar underlying processes.  Confirmatory factor analysis can then be applied to validate those clusters, or subscales, that were found in the exploratory analysis.  Crohnbach’s alpha establishes internal consistency. Verbal Math f g h a b c d e
  • 5. Potential Pitfalls in Psychometric Validation with Factor Analysis  Two major problems challenge the assumptions of these methods and necessitate the development of a new way to analyze and validate the measure.  Time-wise or context-wise measurement can introduce non-independent, non- hierarchical components into the model.  Study habits across terms (longitudinal effects on measurement), identity across social spheres (student perception of intellectual ability when with friends, work, and school)  Factor analysis can be broadened to Bayesian networks and structural equation models, but this method comes with its own assumptions on the underlying geometry and sample size.  Small sample size can create numerical instability in traditional algorithms for both factor analysis and structural equation models (suggest 5-10 participants per item).  If there are 90 items, at least 450 students would be needed to discover subscales, and another 450 would be needed to validate these findings.  Cost and population size can be prohibitive to the study.  Ex. Bridging constructs, or loosely connected concepts without a defined hierarchy, typically run into both limitations and require a new method to validate their surveys.  Many of these issues arise from the dependence on linear mapping from the survey response space to a lower-dimensional space.
  • 6. Moving from Euclidean-Based Statistics to Topologically-Based Statistics Loss of information with each projection to a lower- dimensional space (errors) Topological methods work by partitioning existing space into homogenous components (no maps, no error) 2D example
  • 7. Algebraic Topology and Topological Spaces  Spaces, such as the one formed by survey response data, can be defined topologically and decomposed using algebraic topology/geometry.  Data follows discrete versions of many theoretical results in this area of math.  Topology is rubber sheet geometry, with areas analogous to gluing together children’s building blocks, examining connections on shapes, or hunting for mountain/valley water flows.  Examining how the pieces fit together in a given space allows one to study the topological space’s defining characteristics and the behavior of functions in that space.  One can define connections between pieces of this space via algebra and examine structural properties computationally:  Homotopy (shrinking connected paths to a point)  Homology (hole-counting to define topological classification of structure) 1 2 3Homotopy/ Homology Basins of Attraction (Morse Theory) Hodge Theory
  • 8. Applied Homology: Filtrations and Persistence  Filtration  This is an iterative changing of lens with which to examine data (height, neighbors…).  Topological features appear and disappear as the lens changes.  This creates a nested sequence of features with underlying algebraic objects, called a homology sequence:  Hom1⊂Hom2⊂Hom3⊂Hom4  Persistence is the length of feature existence in a homology sequence, which can be visualized.  This information maps back to the data space’s topology (shape).  The first level of algebraic objects corresponds to connectedness of the space (0th Betti numbers), and this is directly related to a type of clustering analysis. 0 2 4 6 8 10 time Connected space Vertices Hole in middle
  • 9. Solution: Use Machine Learning to Exploit Underlying Topology of Survey Data  Single-linkage hierarchical clustering partitions data space according to connected components (0th Betti numbers) across filtration levels (i.e. a series of distance filtrations).  This method has been successfully applied to neuroimaging studies focused on patterns of brain activity across diseases, neuropsychological tests, and drug states.  This provides a nuanced scanning of topologically-based features within the datasets at different correlation/similarity thresholds.  These can be summarized in feature plots, called persistence diagrams, that track the birth and death of a given feature across thresholds, and can be compared through existing statistical tests, such as a nonparametric Wasserstein metric test.  It has also been used to track gene expression pattern changes across time and/or disease states in microarray studies.  These studies particularly emphasize the visualization of hierarchical clustering through dendrograms (tree diagrams of relationships at different filtration levels) and heat maps (color-coded expression-similarity plots among genes in the microarray).  These visualizations provide a user-friendly way to understand and communicate key findings of this statistical method.  This method can handle data with fewer observations than predictors (p>>n), and, thus, does not require large sample sizes.  Internal correlations do not pose issues; in fact, the method excels at separating data within and across dependencies.
  • 10. Hierarchical Clustering: Example Survey Math Verbal Heatmap Very distinct separation of items (noted by sharp color contrast of heatmap and long height bars on dendrogram)
  • 11. Validation: Dendrograms and Topology  Dendrograms are a special type of graph, called a tree.  Because graphs have a defined topological space and dendrograms are a type of graph, they can be studied or measured through the tools of topology and metric geometry.  Hausdorff distance allows two objects of the same dimension to be compared by a defined metric.  This examines the greatest distance between close points, allowing for a nearness-of-match type of metric on two objects (top left).  Within a graph framework, it allows one to calculate worst best match between two graphs (as shown at bottom left).  This allows for the development of a distance-based nonparametric test to test for dendrogram structural differences in a statistical framework. Hausdorff Distance
  • 12. Steps in Exploration and Validation of Surveys with Hierarchical Clustering 1) Partition sample into training and validation sets/draw a small number of bootstrap samples from the original dataset. 2) Calculate distance metrics in each sample. 3) Run a single-linkage hierarchical clustering algorithm on the training set to obtain exploratory clusters of similar survey items (pvclust R package statistically tests internal survey structure like the Crohnbach alpha metric). Create heat map and dendrogram. 4) Repeat (3) on validation sets to obtain a set of dendrograms. 5) Calculate Hausdorff distance (a topological metric) between dendrograms to estimate differences in results (validation step). 6) Obtain p-value through permuting the extant dendrograms or generating random dendrograms. 7) If p-value is larger than 0.05/n (Bonferroni correction) for dendrograms in (5), no statistically significant differences exist in dendrogram structure, meaning that the survey clusters are consistent and valid.
  • 13. Example Measure: Bridging Constructs  Identity expression across life contexts (ILLCQ Survey):  There are many components to identity in leading theories of identity.  Example: religious identity in school, family, and friends contexts  It was unknown whether identity type or social context plays a greater role in the expression of identity within an individual.  Identity type as more influential would suggest that identity is a fairly static trait.  Context as more influential would suggest that identity is fluid.  Sample size and survey size  406 participants (FIU students) and 91 distinct survey items.  5 draws of 130 participants each for validation and consistency checks.  Results suggest certain aspects of identity are fluid and others are fixed.  Political and racial/ethnic identity are fairly fixed.  Other types, such as athletic or gender, are fairly fluid.  Bootstrapped samples suggest consistency of measure and validate findings.  Subscales hold over different samples (tests of difference, all p>0.05).  This validates the measure and allows for inference into the psychology of identity.
  • 14. Identity by Context Survey HeatmapILLCa_school_success_family ILLCa_school_success_school ILLCa_gender_dating ILLCa_age_dating ILLCa_age_freetime ILLCa_sexual_or_dating ILLCa_beauty_dating ILLCa_sport_dating ILLCa_sport_freetime ILLCa_sport_religion ILLCa_religion_freetime ILLCa_religion_family ILLCa_religion_school ILLCa_religion_neighborhood ILLCa_politics_dating ILLCa_religion_group ILLCa_sexual_or_religion ILLCa_gender_religion ILLCa_age_religion ILLCa_politics_religion ILLCa_politics_family ILLCa_politics_neighborhood ILLCa_politics_group ILLCa_politics_school ILLCa_politics_freetime ILLCa_tribe_dating ILLCa_tribe_group ILLCa_tribe_freetime ILLCa_tribe_family ILLCa_tribe_school ILLCa_tribe_neighborhood ILLCa_tribe_religion ILLCa_beauty_neighborhood ILLCa_look_neighborhood ILLCa_school_success_religion ILLCa_look_religion ILLCa_music_neighborhood ILLCa_race_religion ILLCa_status_religion ILLCa_beauty_religion ILLCa_religion_religion ILLCa_religion_dating ILLCa_race_school ILLCa_race_freetime ILLCa_sexual_or_school ILLCa_beauty_family ILLCa_beauty_freetime ILLCa_beauty_school ILLCa_beauty_group ILLCa_look_freetime ILLCa_look_family ILLCa_look_school ILLCa_status_dating ILLCa_status_group ILLCa_race_group ILLCa_race_dating ILLCa_sexual_or_group ILLCa_sexual_or_freetime ILLCa_gender_freetime ILLCa_gender_family ILLCa_gender_school ILLCa_age_family ILLCa_age_school ILLCa_school_success_neighborhood ILLCa_race_neighborhood ILLCa_sexual_or_neighborhood ILLCa_status_neighborhood ILLCa_gender_neighborhood ILLCa_age_neighborhood ILLCa_sport_school ILLCa_sport_family ILLCa_sport_group ILLCa_music_freetime ILLCa_music_religion ILLCa_music_dating ILLCa_sport_neighborhood ILLCa_school_success_dating ILLCa_school_success_group ILLCa_school_success_freetime ILLCa_music_school ILLCa_music_family ILLCa_music_group ILLCa_gender_group ILLCa_age_group ILLCa_look_group ILLCa_look_dating ILLCa_race_family ILLCa_sexual_or_family ILLCa_status_freetime ILLCa_status_family ILLCa_status_school -0.2 0 0.2 0.4 0.6 0.8 1
  • 15. Conclusion  This method offers a robust way to create survey subscales and validate measures without needing a large sample or a pre-defined measure structure.  Flexible  Deeply routed in mathematics  Statistically testable  Internal validity by pvclust’s statistical test of cluster hierarchy for cut-points  External validity by Hausdorff nonparametric test on bootstrapped samples  It has been successfully applied to a bridging concept survey (factorial design), as well as more traditional survey designs.  This offers a general way to extend traditional areas of statistics to a more general framework through the use of topological theory and tools.  Likely to be useful as data becomes more complex in industry and academia.  May be able to circumvent other problems in modern statistics.  Item response theory (how people in different groups perform on test items)  Network comparison (social networks, covariance networks…) between groups or over time  Structural equation modeling when data does not meet method assumptions
  • 16. Co-authors  The Analysis of bridging constructs with hierarchical clustering methods: An application to identity (under review Journal of Research in Personality)  Seth Schwartz, University of Miami  Anna Lisa Amodeo, University of Naples  Daniel Feaster, University of Miami  Douglas Steinley, University of Missouri  Alan Meca, University of Miami  Simona Picariello, University of Naples

Editor's Notes

  1. (Answers: Year, Abstruse, Village, Cambrian, Kings, 1/125, 4, 1) Bonus: Laplacian:Heat::Ricci:___ (Water, Cold, Curvature, Valley)--Curvature
  2. Santos, J. R. A. (1999). Cronbach’s alpha: A tool for assessing the reliability of scales. Journal of extension, 37(2), 1-5. Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding concepts and applications. American Psychological Association.
  3. Costello, A. B. (2009). Getting the most from your analysis. Pan, 12(2), 131-146. Rouquette, A., & Falissard, B. (2011). Sample size requirements for the internal validation of psychiatric scales. International Journal of Methods in Psychiatric Research, 20(4), 235-249. DeCoster, J. (1998). Overview of factor analysis.
  4. Zomorodian, A., & Carlsson, G. (2005). Computing persistent homology. Discrete & Computational Geometry, 33(2), 249-274. Lee, H., Kang, H., Chung, M. K., Kim, B. N., & Lee, D. S. (2012). Persistent brain network homology from the perspective of dendrogram. IEEE transactions on medical imaging, 31(12), 2267-2277.
  5. Revelle, W. (1979). Hierarchical cluster analysis and the internal structure of tests. Multivariate Behavioral Research, 14(1), 57-74. Lee, H., Kang, H., Chung, M. K., Kim, B. N., & Lee, D. S. (2012). Persistent brain network homology from the perspective of dendrogram. IEEE transactions on medical imaging, 31(12), 2267-2277. Suzuki, R., & Shimodaira, H. (2006). Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics, 22(12), 1540-1542. Chipman, H., & Tibshirani, R. (2006). Hybrid hierarchical clustering with applications to microarray data. Biostatistics, 7(2), 286-301.
  6. Gross, J. L., & Tucker, T. W. (1987). Topological graph theory. Courier Corporation.