Successfully reported this slideshow.
Upcoming SlideShare
×

# Social network analysis

14,166 views

Published on

A high-level overview of social network analysis using gephi with your exported Facebook friends network. See more network analysis at http://allthingsgraphed.com.

Published in: Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No

### Social network analysis

1. 1. SOCIAL NETWORK ANALYSIS Caleb Jones { “email” : “calebjones@gmail.com”, “website” : “http://calebjones.info”, “twitter” : “@JonesWCaleb” }
2. 2. Overview •  Network Analysis – Crash Course •  Degree •  Components •  Modularity •  Ranking •  Resiliency •  Gephi – Intro •  Loading data (Facebook) •  Navigation •  Statistics •  Exporting •  Filtering •  Resiliency
3. 3. Resources SNA Coursera Course (next being taught October 2013) Linked by Albert-László Barabási
4. 4. Network Analysis – Crash Course •  Degree (n): The number of connections a node has. •  Node A has in-degree 3 and out-degree 1 •  Node B has degree 4 A B
5. 5. Network Analysis – Crash Course •  Component (n): A a maximally connected subgraph (undirected). •  Giant component is largest component component (giant) component Graph with nodes { A, B, C, X, Y, Z }
6. 6. Network Analysis – Crash Course •  Modularity (n) ~ Division of a graph into communities (modules/classes/cliques) with dense interconnection with the network having relatively sparse interconnection between communities. Community 1 Community 2 Graph with nodes { A, B, C, X, Y, Z }
7. 7. Network Analysis – Crash Course • Ranking: A measure of a node’s “importance” • Many different methods for determining “importance” • Degree, Centrality, Closeness, Betweenness, Eigenvector, HITS, PageRank, Erdös Number • Which one to consider depends on the question being asked • Precursor to identifying network resilience, diffusion, and vulnerability
8. 8. Network Analysis – Crash Course • Degree ranking: Quantity over quality Node Score A 3 B 3 C 1 D 1 X 1 Y 1 Z 3 Q 1
9. 9. Network Analysis – Crash Course • Betweeness Ranking: How frequently a node appears on shortest paths. Node Score A 15 B 11 C 0 D 0 X 0 Y 0 Z 11 Q 0
10. 10. Network Analysis – Crash Course • Closeness Ranking: Average number of hops from a node to rest of network. Node Score A 1.571 B 1.857 C 2.714 D 2.714 X 2.714 Y 2.714 Z 1.857 Q 2.429 Note: Smaller is (usually) better
11. 11. Network Analysis – Crash Course • Eigenvector Ranking: A node’s “influence” on the network (accounts for who you know) Node Score A 1 B 0.836 C 0.392 D 0.392 X 0.392 Y 0.392 Z 0.836 Q 0.465 Google’s PageRank is a variant of this Based on eigenvector of adjacency matrix
12. 12. Network Analysis – Crash Course • Erdös Ranking: Number of hops to specific node (degrees of separation). Node Score A 0 B 1 C 2 D 2 X 2 Y 2 Z 1 Q 1 Note: Smaller is (usually) better What if “Erdös” is an influential CEO? What if “Erdös” has bird flu? Erdös
13. 13. Network Analysis – Crash Course • Erdös Ranking: Number of hops to specific node (degrees of separation). Node Score A 2 B 1 C 2 D 0 X 4 Y 4 Z 3 Q 3 Note: Smaller is (usually) better What if “Erdös” is an influential CEO? What if “Erdös” has bird flu? Erdös
14. 14. Network Analysis – Crash Course • Limitations: • Only considered undirected networks (directed is more complicated) • Treated all edges as equal. Many networks have a weight or cost associated to edges (e.g. distance) • Treated all nodes as equal. A node’s importance may be inherent based on attributes separate from its position in network (e.g. dating sites)
15. 15. Network Analysis – Crash Course • Resiliency (removing nodes/links): • Target nodes based on their “importance” • High degree nodes more likely to affect local communities • High betweeness/Eigenvector nodes more likely to fragment communities
16. 16. Gephi Introduction •  Platform for visualizing and analyzing networks •  https://gephi.org/ •  Cross-platform •  Plugin model
18. 18. Layout Layout -> Fruchterman Reingold
19. 19. Partitioning Communities 1.  Statistic -> Modularity -> Run (use defaults) 2.  Partition -> Nodes (refresh) -> Modularity class -> Apply
20. 20. Degree Distribution 1.  Statistic -> Average Degree -> Run 2.  Partition -> Nodes (refresh) -> Modularity class -> Apply Lots of nodes with few connections Only a few with a large number of connections Power law distribution?
21. 21. Node Ranking by Degree 1.  Ranking -> Nodes (refresh) -> Degree -> Apply (try tweaking min/max size and Spline for desired emphasis)
22. 22. Filtering Isolated Nodes (“noise”) 1.  Statistics -> Connected Components -> Run 2.  Filters -> Attributes -> Partition Count -> Component ID 3.  Drag “Component ID” down into “Queries” section 4.  Click on “Partition Count”, slide the settings bar, and click “Filter” – adjust to remove isolated nodes Can be important step when dealing with very large data sets. Depending on degree distribution, filter can be set quite high.
23. 23. Re-adjust after Filtering • Need to re-run previous steps to refresh calculated values now that filtering has been done. • Statistics -> Average degree, modularity, connected components •  How did these numbers change? • Re-partition node color by modularity class now that modularity has been recalculated • Run Fruchterman Reingold layout again to fill space left over from filtered nodes
24. 24. Have you saved yet!?
25. 25. Node Ranking by Centrality 1.  Statistics -> Network Diameter -> Run 2.  Ranking -> Betweeness Centrality -> Apply
26. 26. Erdös Number •  You may have noticed a key node which both has the highest degree and betweeness ranking. •  Click on the “Edit” button and select that node (note the name) •  Statistics -> Erdös Number -> Select that name -> OK •  What will happen if you select a less conspicuous node?
27. 27. Data Lab •  Go to “Data Laboratory” •  All node information as well as calculated statistics appear here in a spreadsheet. •  Sort by “Erdös Number” (descending) •  What is the largest Erdös Number? N degrees of ________ . •  Try sorting by other values (degree, closeness, betweeness) Max is 7 degrees of separation
28. 28. Node Ranking by Eigenvector Centrality 1.  Statistics -> Eigenvector Centrality -> Run 2.  Ranking -> Eigenvector Centrality -> Apply
29. 29. Node Ranking by PageRank 1.  Statistics -> PageRank -> Run 2.  Ranking -> PageRank -> Apply
30. 30. Export to Image •  Go to “Preview” mode •  Click “Refresh” to see what you have now •  Add node labels •  “Node Labels” -> “Show Labels” •  Adjust font size to avoid label overlapping •  If Node Labels are overlapping, try expanding layout •  Back to “Overview” -> Layout -> Fruchterman Reingold •  Increase the “Area” parameter and re-run the layout •  Then go back to “Preview” mode and click “Refresh” •  May need to re-adjust Node Label text size •  Experiment with “Curved” edges
31. 31. labels omitted in slidedeck for privacy
32. 32. Before we attack the network, save!
33. 33. Network Resiliency •  How can we fragment the network or increase the separation between nodes? •  Which nodes, if removed/influenced, would most greatly impact the network? •  What information have we learned already that could be used?
34. 34. Network Resiliency •  Go to “Data Laboratory” -> sort by “PageRank descending •  Select top 5 rows and delete them (did you save first!!!) •  Note their names – Are these people influential in your life? sort Top 5
35. 35. Network Resiliency •  Go back to statistics and note the following: •  Average Degree, Network Diameter, Modularity, Connected Components, Average Path Length •  Also note how the network visually has changed •  Re-run the statistics above and note how the numbers changed •  Did you successfully fragment the network (did # of connected components increase)? (disrupting communications) •  How many nodes do you think you’d have to remove if you removed by lowest PageRank scores first? (robustness of network) •  What if links represented load distributed across network? How would the network load change after removing these key nodes? (cascading failure)
36. 36. Review •  Network Analysis – Crash Course •  Degree •  Components •  Modularity •  Ranking •  Resiliency •  Gephi – Intro •  Loading data (Facebook) •  Navigation •  Statistics •  Exporting •  Filtering •  Resiliency
37. 37. Questions?