Successfully reported this slideshow.
Upcoming SlideShare
×

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

However, the the graph theory jargon can make graph analytics seem more intimidating for self-study than is necessary. In this talk, the audience will be exposed to some of the basic concepts of graph theory (no prerequisite math knowledge needed!) and a few of the Python tools available for graph analysis.

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

1. 1. This is a novice-track talk, so all concepts and examples are kept simple 1. Basic graph theory concepts and definitions 2. A few real-world scenarios framed as graph data 3. Working with graphs in Python The overall goal of this talk is to spark your interest in and show you what’s out there as a jumping off point for you to go deeper
2. 2. Graph: “A structure amounting to a set of objects in which some pairs of the objects are in some sense ‘related’. The objects correspond to mathematical abstractions called vertices (also called nodes or points) and each of the related pairs of vertices is called an edge (also called an arc or line)” – Richard Trudeau, Introduction to Graph Theory (1st edition, 1993) Graph Analytics: “Analysis of data structured as a graph (sometimes also part of network analysis or link analysis depending on scope and context)” – Me, talking to a stress ball as I made these slides
3. 3. • We see two vertices joined by a single edge • Vertex 1 is adjacent to vertex 2 • The neighborhood of vertex 1 is all adjacent vertices (vertex 2 in this case)
4. 4. • We see that there is a loop on vertex a • Vertices a and b have multiple edges between them • Vertex c has a degree of 3 • There exists a path from vertex a to vertex e • Vertices f, g, and h form a 3- cycle
5. 5. • We have no single cut vertex or cut edge (one that would create more disjoint vertex/edge sets if removed) • We can separate this graph into two disconnected sets: 1) Vertex Set 1 = {a, b, c, d, e} 2) Vertex Set 2 = {f, g, h}
6. 6. • Imagine symmetric vertex labels along the top and left hand sides of the matrix • A one in a particular slot tells us that the two vertices are adjacent
7. 7. • In this graph two vertices are joined by a single directed edge • There is a dipath from vertex 1 to vertex 2 but not from vertex 2 to vertex 1
8. 8. • Every vertex has ‘played’ every other vertex • We can see that there is no clear winner (every vertex has indegree and outdegree of 2)
9. 9. • Vertices from Set 1 = {a, b, c, d} are only adjacent to vertices from Set 2 = {e, f, g, h} • This can be extended to tripartite graphs (3 sets) or as many sets as we like (n-partite graphs) • Can we pair vertices from each set together?
10. 10. We can pair every vertex from one set to a vertex from the other using only existing edges
11. 11. • We can assign weights to edges of a graph • As we follow a path through the graph, these weights accumulate • For example, the path a - > b -> c has an associated weight of 0.5 + 0.4 = 0.9
12. 12. • We can assign colors to vertices • The graph we see here has a proper coloring (no two vertices of the same color are adjacent) • We can also color edges!
13. 13. • Are we focused more on objects or the relationships/interactions between them? • Are we looking at transition states? • Is orientation important? If you can imagine a graph to represent it, it’s probably worth giving it a shot, if only for your own learning and exploration!
14. 14. • If the lines represent connections, what can we say about the people highlighted in red? • What kinds of questions might a graph be able to answer?
15. 15. • e and d have the highest degree • What might the c-d-e cycle tell us? • What can we say about cut vertices?
16. 16. If we have page view data with timestamps how might we represent this as a graph?
17. 17. • What might loops or multiple edges between vertices represent? • What types of data might we want to use as values on the edges? • What might comparing indegrees and outdegrees on different vertices represent?
18. 18. If we have to regularly pick up a load at the train station, make deliveries to every factory and then return to the garage how can a graph help us find an optimal route?
19. 19. • We can assign weights to each edge to represent distance, travel time, gas cost for the distance, etc • The path with the lowest total weight represents the shortest/cheapest/fastest/etc • Note that edge weights are only displayed for f-e and f-a
20. 20. If the following people want to attend the following talks (a-h), what’s the minimum number of sessions we need to satisfy everyone?
21. 21. • We can use the talks as vertices and add edges between talks that have the same person interested • The minimum number of colors needed for a proper coloring shows us the minimum number of sessions we need to satisfy everyone
22. 22. https://github.com/igraph/python-igraph https://github.com/networkx
23. 23. https://graph-tool.skewed.de
24. 24. • GraphML (XML-based) • GML (ASCII-based) • NetworkX has built in functions to work with a Pandas DataFrame or a NumPy array/matrix
25. 25. import networkx as nx import matplotlib.pyplot as plt G = nx.Graph() vertices = [] for x in range(1, 6): vertices.append(x) G.add_nodes_from(vertices) G.add_edges_from([(1, 2), (2, 3), (5, 4), (4, 2), (1, 3), (5, 1), (5, 2), (3, 4)]) pos = nx.spring_layout(G) nx.draw_networkx_nodes(G, pos, node_size=20) nx.draw_networkx_edges(G, pos, width=5) nx.draw_networkx_labels(G, pos, font_size=14) nx.draw(G, pos) plt.show()
26. 26. import networkx as nx import matplotlib.pyplot as plt G = nx.Graph() G.add_nodes_from(['a', 'b', 'c']) G.add_edge('a', 'b', weight=0.5) G.add_edge('b', 'c', weight=0.2) G.add_edge('c', 'a', weight=0.7) pos = nx.spring_layout(G) nx.draw_networkx_nodes(G, pos, node_size=500) nx.draw_networkx_edges(G, pos, width=6) nx.draw_networkx_labels(G, pos, font_size=14) nx.draw_networkx_edge_labels(G, pos, font_size=14) nx.draw(G, pos) plt.show()
27. 27. >>> G.nodes() [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] >>> nx.shortest_path(G, 1, 18) [1, 3, 18] >>> G.degree() {1: 4, 2: 3, 3: 4, 4: 4, 5: 4, 6: 3, 7: 3, 8: 3, 9: 4, 10: 3, 11: 2, 12: 2, 13: 2, 14: 4, 15: 3, 16: 3, 17: 2, 18: 3, 19: 3, 20: 3}
28. 28. >>> nx.greedy_color(G) {'d': 0, 'a': 0, 'e': 1, 'b': 1, 'c': 1, 'f': 2, 'h': 1, 'g': 0} >>> temp = nx.greedy_color(G) >>> len(set(temp.values())) 3
29. 29. import networkx as nx import matplotlib.pyplot as plt G = nx.DiGraph([(1, 2), (1, 3), (4, 1), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)]) pos = nx.circular_layout(G) nx.draw_networkx_nodes(G, pos, node_size=200) nx.draw_networkx_edges(G, pos) nx.draw_networkx_labels(G, pos, fontsize=14) >>> nx.has_path(G, 1, 5) True >>> nx.has_path(G, 5, 1) False >>> nx.shortest_path(G, 1, 4) [1, 2, 4]
30. 30. >>> nx.maximal_matching(G) {(1, 4), (5, 2), (6, 3)}
31. 31. • There’s a NetworkX tutorial tomorrow! • In-browser Graphviz: webgraphviz.com • Free graph theory textbook: An Introduction to Combinatorics and Graph Theory, David Guichard • Open problems in graph theory: openproblemgarden.org • Graph databases • Association for Computational Linguistics (ACL) 2010 Workshop on Graph-based Methods for Natural Language Processing • Free papers: researchgate.net