There is a great deal of recent excitement around the idea of finding shape in data. The relatively young field of topological data analysis (TDA) provides tools which can quantify, investigate, and utilize shape in data to understand something about the domain from which the data was obtained. These methods have been successfully used in many fields, including atmospheric science, time series analysis, and genetics to provide deep insights. However, what does it really mean for data to have shape? In this talk, we will look at some common tools used in TDA such as persistence diagrams, Reeb graphs, and mapper, and ideas for how different kinds of data can fit into the TDA pipeline.
1. What does it mean for data to have shape?
Elizabeth Munch
University at Albany – SUNY:: Dept. of Mathematics & Statistics
Apr 7, 2016
Liz Munch (UAlbany) TDA Apr 7, 2016 1 / 24
2. What does it mean for data to have shape?
Elizabeth Munch Data Point
University at Albany – SUNY:: Dept. of Mathematics & Statistics
Apr 7, 2016
Liz Munch (UAlbany) TDA Apr 7, 2016 1 / 24
5. Large Data Sets
Main goal of Topological Data Analysis (TDA)
Find and quantify structure in big data.
Liz Munch (UAlbany) TDA Apr 7, 2016 3 / 24
6. Large Data Sets
Main goal of Topological Data Analysis (TDA)
Find and quantify structure in big data.
Goals of this talk
What tools are available?
How do we fit educational data into this pipeline?
Liz Munch (UAlbany) TDA Apr 7, 2016 3 / 24
7. Large Data Sets
Main goal of Topological Data Analysis (TDA)
Find and quantify structure in big data.
Goals of this talk
What tools are available?
How do we fit educational data into this pipeline?
Spoiler alert: I don’t know how to do this....
Liz Munch (UAlbany) TDA Apr 7, 2016 3 / 24
10. What does it mean for data to have shape?
Topology = Topography
Mathematical study of spaces
preserved under continuous
deformations
stretching and bending
not tearing or gluing
Study of the shape and
features of the surface of the
Earth
Liz Munch (UAlbany) TDA Apr 7, 2016 5 / 24
12. History Pt 2
Esoteric field of study 1700-2000
Algebraic topology
Applications/intersections with dynamical systems
Would never be considered “applied” in traditional sense.
Liz Munch (UAlbany) TDA Apr 7, 2016 7 / 24
13. History Pt 2
Esoteric field of study 1700-2000
Algebraic topology
Applications/intersections with dynamical systems
Would never be considered “applied” in traditional sense.
Topology, the pinnacle of human thought.
In four centuries it may be useful.
- Alexander Solzhenitzin, “The First Circle” 1968
Liz Munch (UAlbany) TDA Apr 7, 2016 7 / 24
14. History Pt 2
Esoteric field of study 1700-2000
Algebraic topology
Applications/intersections with dynamical systems
Would never be considered “applied” in traditional sense.
Topology, the pinnacle of human thought.
In four centuries it may be useful.
- Alexander Solzhenitzin, “The First Circle” 1968
Things change ca.2000
Introduction of Persistent Homology
Liz Munch (UAlbany) TDA Apr 7, 2016 7 / 24
15. Main questions
How do we quantify the structure we see?
Can we calculate something to represent the structure?
Liz Munch (UAlbany) TDA Apr 7, 2016 8 / 24
17. Very small radius is just
dots.
Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
18. Very small radius is just
dots.
Very large radius is just a
blob.
Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
19. Very small radius is just
dots.
Very large radius is just a
blob.
Some range of radii lets us
see the big circle.
Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
20. Very small radius is just
dots.
Very large radius is just a
blob.
Some range of radii lets us
see the big circle.
Some small circles appear
and disappear quickly....
maybe we get to just call
these noise!
Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
21. Very small radius is just
dots.
Very large radius is just a
blob.
Some range of radii lets us
see the big circle.
Some small circles appear
and disappear quickly....
maybe we get to just call
these noise!
How do we quantify this?
Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
22. Homology & Persistent Homology
What is Homology?
A topological invariant which assigns
a sequence of vector spaces, Hk(X),
to a given topological space X.
Liz Munch (UAlbany) TDA Apr 7, 2016 10 / 24
23. Homology & Persistent Homology
What is Homology?
A topological invariant which assigns
a sequence of vector spaces, Hk(X),
to a given topological space X.
Liz Munch (UAlbany) TDA Apr 7, 2016 10 / 24
24. Homology & Persistent Homology
What is Homology?
A topological invariant which assigns
a sequence of vector spaces, Hk(X),
to a given topological space X.
What is Persistent Homology?
A way to watch how the homology of
a filtration (sequence) of topological
spaces changes so that we can
understand something about the
space.
Liz Munch (UAlbany) TDA Apr 7, 2016 10 / 24
41. Mapper
Breast cancer gene expression data
Determine a good filter function
Run mapper
Found new type of breast cancer (c-MYB+) with high survival rate
Image: Nicolau Levine Carlsson, PNAS 2011
Liz Munch (UAlbany) TDA Apr 7, 2016 22 / 24
43. Conclusions
Topology can help find structure in data that is not obvious by other
means.
Liz Munch (UAlbany) TDA Apr 7, 2016 23 / 24
44. Conclusions
Topology can help find structure in data that is not obvious by other
means.
Lots of tools available, lots of open-source code for computation!
Mapper, Reeb graph, Contour Tree, Merge tree
Python mapper - danifold.net/mapper/
Persistence
Perseus - sas.upenn.edu/~vnanda/perseus/
Dionysus - mrzv.org/software/dionysus/
R TDA - cran.r-project.org/web/packages/TDA/
PHAT - bitbucket.org/phat-code/phat
Liz Munch (UAlbany) TDA Apr 7, 2016 23 / 24
45. Conclusions
Topology can help find structure in data that is not obvious by other
means.
Lots of tools available, lots of open-source code for computation!
Mapper, Reeb graph, Contour Tree, Merge tree
Python mapper - danifold.net/mapper/
Persistence
Perseus - sas.upenn.edu/~vnanda/perseus/
Dionysus - mrzv.org/software/dionysus/
R TDA - cran.r-project.org/web/packages/TDA/
PHAT - bitbucket.org/phat-code/phat
Input from domain scientists is imperative!
What is the right question?
What is the right tool?
How do we interpret the output?
Liz Munch (UAlbany) TDA Apr 7, 2016 23 / 24