SlideShare a Scribd company logo
1 of 37
SOCIAL NETWORK
ANALYSIS
Caleb Jones
{
“email” : “calebjones@gmail.com”,
“website” : “http://calebjones.info”,
“twitter” : “@JonesWCaleb”
}
Overview
•  Network Analysis – Crash Course
•  Degree
•  Components
•  Modularity
•  Ranking
•  Resiliency
•  Gephi – Intro
•  Loading data (Facebook)
•  Navigation
•  Statistics
•  Exporting
•  Filtering
•  Resiliency
Resources
SNA Coursera Course
(next being taught October 2013)
Linked by
Albert-László Barabási
Network Analysis – Crash Course
•  Degree (n): The number of connections a node has.
•  Node A has in-degree 3 and out-degree 1
•  Node B has degree 4
A
B
Network Analysis – Crash Course
•  Component (n): A a maximally connected subgraph
(undirected).
•  Giant component is largest component
component (giant) component
Graph with nodes { A, B, C, X, Y, Z }
Network Analysis – Crash Course
•  Modularity (n) ~ Division of a graph into communities
(modules/classes/cliques) with dense interconnection with
the network having relatively sparse interconnection
between communities.
Community 1 Community 2
Graph with nodes { A, B, C, X, Y, Z }
Network Analysis – Crash Course
• Ranking: A measure of a node’s
“importance”
• Many different methods for determining
“importance”
• Degree, Centrality, Closeness, Betweenness,
Eigenvector, HITS, PageRank, Erdös Number
• Which one to consider depends on the
question being asked
• Precursor to identifying network resilience,
diffusion, and vulnerability
Network Analysis – Crash Course
• Degree ranking: Quantity over quality
Node Score
A 3
B 3
C 1
D 1
X 1
Y 1
Z 3
Q 1
Network Analysis – Crash Course
• Betweeness Ranking: How frequently a
node appears on shortest paths.
Node Score
A 15
B 11
C 0
D 0
X 0
Y 0
Z 11
Q 0
Network Analysis – Crash Course
• Closeness Ranking: Average number of
hops from a node to rest of network.
Node Score
A 1.571
B 1.857
C 2.714
D 2.714
X 2.714
Y 2.714
Z 1.857
Q 2.429
Note: Smaller is (usually) better
Network Analysis – Crash Course
• Eigenvector Ranking: A node’s “influence”
on the network (accounts for who you know)
Node Score
A 1
B 0.836
C 0.392
D 0.392
X 0.392
Y 0.392
Z 0.836
Q 0.465
Google’s PageRank is a variant of this
Based on eigenvector of adjacency matrix
Network Analysis – Crash Course
• Erdös Ranking: Number of hops to
specific node (degrees of separation).
Node Score
A 0
B 1
C 2
D 2
X 2
Y 2
Z 1
Q 1
Note: Smaller is (usually) better
What if “Erdös” is an influential CEO?
What if “Erdös” has bird flu?
Erdös
Network Analysis – Crash Course
• Erdös Ranking: Number of hops to
specific node (degrees of separation).
Node Score
A 2
B 1
C 2
D 0
X 4
Y 4
Z 3
Q 3
Note: Smaller is (usually) better
What if “Erdös” is an influential CEO?
What if “Erdös” has bird flu?
Erdös
Network Analysis – Crash Course
• Limitations:
• Only considered undirected networks (directed
is more complicated)
• Treated all edges as equal. Many networks
have a weight or cost associated to edges (e.g.
distance)
• Treated all nodes as equal. A node’s importance
may be inherent based on attributes separate
from its position in network (e.g. dating sites)
Network Analysis – Crash Course
• Resiliency (removing nodes/links):
• Target nodes based on their “importance”
• High degree nodes more likely to affect
local communities
• High betweeness/Eigenvector nodes
more likely to fragment communities
Gephi Introduction
•  Platform for visualizing and analyzing networks
•  https://gephi.org/
•  Cross-platform
•  Plugin model
Facebook Dataset
•  Download your data (gml)
•  http://snacourse.com/getnet/
•  Import into Gephi
•  File -> Open -> Select downloaded
.gml file
•  Choose “undirected”
for “Graph Type”
Layout
Layout -> Fruchterman Reingold
Partitioning Communities
1.  Statistic -> Modularity -> Run (use defaults)
2.  Partition -> Nodes (refresh) -> Modularity class -> Apply
Degree Distribution
1.  Statistic -> Average Degree -> Run
2.  Partition -> Nodes (refresh) -> Modularity class -> Apply
Lots of nodes with
few connections
Only a few with a large
number of connections
Power law distribution?
Node Ranking by Degree
1.  Ranking -> Nodes (refresh) -> Degree -> Apply
(try tweaking min/max size and Spline for desired emphasis)
Filtering Isolated Nodes (“noise”)
1.  Statistics -> Connected
Components -> Run
2.  Filters -> Attributes -> Partition
Count -> Component ID
3.  Drag “Component ID” down into
“Queries” section
4.  Click on “Partition Count”, slide the
settings bar, and click “Filter” –
adjust to remove isolated nodes
Can be important step when dealing with very
large data sets. Depending on degree
distribution, filter can be set quite high.
Re-adjust after Filtering
• Need to re-run previous steps to refresh
calculated values now that filtering has been
done.
• Statistics -> Average degree, modularity,
connected components
•  How did these numbers change?
• Re-partition node color by modularity class now
that modularity has been recalculated
• Run Fruchterman Reingold layout again to fill
space left over from filtered nodes
Have you saved yet!?
Node Ranking by Centrality
1.  Statistics -> Network Diameter -> Run
2.  Ranking -> Betweeness Centrality -> Apply
Erdös Number
•  You may have noticed a key node which both has the
highest degree and betweeness ranking.
•  Click on the “Edit” button and select that node
(note the name)
•  Statistics -> Erdös Number -> Select that name -> OK
•  What will happen if you select a less conspicuous node?
Data Lab
•  Go to “Data Laboratory”
•  All node information as well as calculated statistics appear
here in a spreadsheet.
•  Sort by “Erdös Number” (descending)
•  What is the largest Erdös Number? N degrees of ________ .
•  Try sorting by other values (degree, closeness, betweeness)
Max is 7 degrees
of separation
Node Ranking by Eigenvector Centrality
1.  Statistics -> Eigenvector Centrality -> Run
2.  Ranking -> Eigenvector Centrality -> Apply
Node Ranking by PageRank
1.  Statistics -> PageRank -> Run
2.  Ranking -> PageRank -> Apply
Export to Image
•  Go to “Preview” mode
•  Click “Refresh” to see what you have now
•  Add node labels
•  “Node Labels” -> “Show Labels”
•  Adjust font size to avoid label overlapping
•  If Node Labels are overlapping, try expanding layout
•  Back to “Overview” -> Layout -> Fruchterman Reingold
•  Increase the “Area” parameter and re-run the layout
•  Then go back to “Preview” mode and click “Refresh”
•  May need to re-adjust Node Label text size
•  Experiment with “Curved” edges
labels omitted in slidedeck for privacy
Before we attack the network, save!
Network Resiliency
•  How can we fragment the network or increase the
separation between nodes?
•  Which nodes, if removed/influenced, would most greatly
impact the network?
•  What information have we learned already that could be
used?
Network Resiliency
•  Go to “Data Laboratory” -> sort by “PageRank descending
•  Select top 5 rows and delete them (did you save first!!!)
•  Note their names – Are these people influential in your life? sort
Top 5
Network Resiliency
•  Go back to statistics and note the following:
•  Average Degree, Network Diameter, Modularity, Connected
Components, Average Path Length
•  Also note how the network visually has changed
•  Re-run the statistics above and note how the numbers
changed
•  Did you successfully fragment the network (did # of connected
components increase)? (disrupting communications)
•  How many nodes do you think you’d have to remove if you
removed by lowest PageRank scores first? (robustness of network)
•  What if links represented load distributed across network? How
would the network load change after removing these key nodes?
(cascading failure)
Review
•  Network Analysis – Crash Course
•  Degree
•  Components
•  Modularity
•  Ranking
•  Resiliency
•  Gephi – Intro
•  Loading data (Facebook)
•  Navigation
•  Statistics
•  Exporting
•  Filtering
•  Resiliency
Questions?

More Related Content

What's hot

Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
Toronto Metropolitan University
 

What's hot (20)

Social Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made EasySocial Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made Easy
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
 
A Guide to Social Network Analysis
A Guide to Social Network AnalysisA Guide to Social Network Analysis
A Guide to Social Network Analysis
 
Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectiveness
 
Network measures used in social network analysis
Network measures used in social network analysis Network measures used in social network analysis
Network measures used in social network analysis
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis Workshop
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
06 Community Detection
06 Community Detection06 Community Detection
06 Community Detection
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit V
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101
 
The Basics of Social Network Analysis
The Basics of Social Network AnalysisThe Basics of Social Network Analysis
The Basics of Social Network Analysis
 
Social network analysis part ii
Social network analysis part iiSocial network analysis part ii
Social network analysis part ii
 
09 Ego Network Analysis
09 Ego Network Analysis09 Ego Network Analysis
09 Ego Network Analysis
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part I
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphs
 
3 Centrality
3 Centrality3 Centrality
3 Centrality
 
Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief Overview
 

Viewers also liked

Social Networks and Social Capital
Social Networks and Social CapitalSocial Networks and Social Capital
Social Networks and Social Capital
Giorgos Cheliotis
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and more
Wael Elrifai
 
A comparative study of social network analysis tools
A comparative study of social network analysis toolsA comparative study of social network analysis tools
A comparative study of social network analysis tools
David Combe
 

Viewers also liked (20)

Social Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to ToolsSocial Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to Tools
 
Social Networks and Social Capital
Social Networks and Social CapitalSocial Networks and Social Capital
Social Networks and Social Capital
 
How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...
 
I Social Network
I Social NetworkI Social Network
I Social Network
 
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
 
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
 
Facebook Network Analysis using Gephi
Facebook Network Analysis using GephiFacebook Network Analysis using Gephi
Facebook Network Analysis using Gephi
 
Social Network Analysis in R
Social Network Analysis in RSocial Network Analysis in R
Social Network Analysis in R
 
Social Network Analysis With R
Social Network Analysis With RSocial Network Analysis With R
Social Network Analysis With R
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and more
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Basics of Computation and Modeling - Lecture 2 in Introduction to Computation...
Basics of Computation and Modeling - Lecture 2 in Introduction to Computation...Basics of Computation and Modeling - Lecture 2 in Introduction to Computation...
Basics of Computation and Modeling - Lecture 2 in Introduction to Computation...
 
Social Network Analysis and Visualization
Social Network Analysis and VisualizationSocial Network Analysis and Visualization
Social Network Analysis and Visualization
 
Simulation in Social Sciences - Lecture 6 in Introduction to Computational S...
Simulation in Social Sciences -  Lecture 6 in Introduction to Computational S...Simulation in Social Sciences -  Lecture 6 in Introduction to Computational S...
Simulation in Social Sciences - Lecture 6 in Introduction to Computational S...
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
DREaM Event 2: Louise Cooke
DREaM Event 2: Louise CookeDREaM Event 2: Louise Cooke
DREaM Event 2: Louise Cooke
 
Social Network Analysis, Semantic Web and Learning Networks
Social Network Analysis, Semantic Web and Learning NetworksSocial Network Analysis, Semantic Web and Learning Networks
Social Network Analysis, Semantic Web and Learning Networks
 
A comparative study of social network analysis tools
A comparative study of social network analysis toolsA comparative study of social network analysis tools
A comparative study of social network analysis tools
 
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
 
تحليل المعلومات في الشبكات الإجتماعية
تحليل المعلومات في الشبكات الإجتماعيةتحليل المعلومات في الشبكات الإجتماعية
تحليل المعلومات في الشبكات الإجتماعية
 

Similar to Social network analysis

Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
Grisha Weintraub
 
Preprocessing
PreprocessingPreprocessing
Preprocessing
mmuthuraj
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
elliando dias
 

Similar to Social network analysis (20)

Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learning
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
Advanced c c++
Advanced c c++Advanced c c++
Advanced c c++
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
Cassandra
CassandraCassandra
Cassandra
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
 
Pathway and network analysis
Pathway and network analysisPathway and network analysis
Pathway and network analysis
 
Birch1
Birch1Birch1
Birch1
 
Mining the social web 6
Mining the social web 6Mining the social web 6
Mining the social web 6
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 
Preprocessing
PreprocessingPreprocessing
Preprocessing
 
2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
social.pptx
social.pptxsocial.pptx
social.pptx
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Social network analysis

  • 1. SOCIAL NETWORK ANALYSIS Caleb Jones { “email” : “calebjones@gmail.com”, “website” : “http://calebjones.info”, “twitter” : “@JonesWCaleb” }
  • 2. Overview •  Network Analysis – Crash Course •  Degree •  Components •  Modularity •  Ranking •  Resiliency •  Gephi – Intro •  Loading data (Facebook) •  Navigation •  Statistics •  Exporting •  Filtering •  Resiliency
  • 3. Resources SNA Coursera Course (next being taught October 2013) Linked by Albert-László Barabási
  • 4. Network Analysis – Crash Course •  Degree (n): The number of connections a node has. •  Node A has in-degree 3 and out-degree 1 •  Node B has degree 4 A B
  • 5. Network Analysis – Crash Course •  Component (n): A a maximally connected subgraph (undirected). •  Giant component is largest component component (giant) component Graph with nodes { A, B, C, X, Y, Z }
  • 6. Network Analysis – Crash Course •  Modularity (n) ~ Division of a graph into communities (modules/classes/cliques) with dense interconnection with the network having relatively sparse interconnection between communities. Community 1 Community 2 Graph with nodes { A, B, C, X, Y, Z }
  • 7. Network Analysis – Crash Course • Ranking: A measure of a node’s “importance” • Many different methods for determining “importance” • Degree, Centrality, Closeness, Betweenness, Eigenvector, HITS, PageRank, Erdös Number • Which one to consider depends on the question being asked • Precursor to identifying network resilience, diffusion, and vulnerability
  • 8. Network Analysis – Crash Course • Degree ranking: Quantity over quality Node Score A 3 B 3 C 1 D 1 X 1 Y 1 Z 3 Q 1
  • 9. Network Analysis – Crash Course • Betweeness Ranking: How frequently a node appears on shortest paths. Node Score A 15 B 11 C 0 D 0 X 0 Y 0 Z 11 Q 0
  • 10. Network Analysis – Crash Course • Closeness Ranking: Average number of hops from a node to rest of network. Node Score A 1.571 B 1.857 C 2.714 D 2.714 X 2.714 Y 2.714 Z 1.857 Q 2.429 Note: Smaller is (usually) better
  • 11. Network Analysis – Crash Course • Eigenvector Ranking: A node’s “influence” on the network (accounts for who you know) Node Score A 1 B 0.836 C 0.392 D 0.392 X 0.392 Y 0.392 Z 0.836 Q 0.465 Google’s PageRank is a variant of this Based on eigenvector of adjacency matrix
  • 12. Network Analysis – Crash Course • Erdös Ranking: Number of hops to specific node (degrees of separation). Node Score A 0 B 1 C 2 D 2 X 2 Y 2 Z 1 Q 1 Note: Smaller is (usually) better What if “Erdös” is an influential CEO? What if “Erdös” has bird flu? Erdös
  • 13. Network Analysis – Crash Course • Erdös Ranking: Number of hops to specific node (degrees of separation). Node Score A 2 B 1 C 2 D 0 X 4 Y 4 Z 3 Q 3 Note: Smaller is (usually) better What if “Erdös” is an influential CEO? What if “Erdös” has bird flu? Erdös
  • 14. Network Analysis – Crash Course • Limitations: • Only considered undirected networks (directed is more complicated) • Treated all edges as equal. Many networks have a weight or cost associated to edges (e.g. distance) • Treated all nodes as equal. A node’s importance may be inherent based on attributes separate from its position in network (e.g. dating sites)
  • 15. Network Analysis – Crash Course • Resiliency (removing nodes/links): • Target nodes based on their “importance” • High degree nodes more likely to affect local communities • High betweeness/Eigenvector nodes more likely to fragment communities
  • 16. Gephi Introduction •  Platform for visualizing and analyzing networks •  https://gephi.org/ •  Cross-platform •  Plugin model
  • 17. Facebook Dataset •  Download your data (gml) •  http://snacourse.com/getnet/ •  Import into Gephi •  File -> Open -> Select downloaded .gml file •  Choose “undirected” for “Graph Type”
  • 19. Partitioning Communities 1.  Statistic -> Modularity -> Run (use defaults) 2.  Partition -> Nodes (refresh) -> Modularity class -> Apply
  • 20. Degree Distribution 1.  Statistic -> Average Degree -> Run 2.  Partition -> Nodes (refresh) -> Modularity class -> Apply Lots of nodes with few connections Only a few with a large number of connections Power law distribution?
  • 21. Node Ranking by Degree 1.  Ranking -> Nodes (refresh) -> Degree -> Apply (try tweaking min/max size and Spline for desired emphasis)
  • 22. Filtering Isolated Nodes (“noise”) 1.  Statistics -> Connected Components -> Run 2.  Filters -> Attributes -> Partition Count -> Component ID 3.  Drag “Component ID” down into “Queries” section 4.  Click on “Partition Count”, slide the settings bar, and click “Filter” – adjust to remove isolated nodes Can be important step when dealing with very large data sets. Depending on degree distribution, filter can be set quite high.
  • 23. Re-adjust after Filtering • Need to re-run previous steps to refresh calculated values now that filtering has been done. • Statistics -> Average degree, modularity, connected components •  How did these numbers change? • Re-partition node color by modularity class now that modularity has been recalculated • Run Fruchterman Reingold layout again to fill space left over from filtered nodes
  • 24. Have you saved yet!?
  • 25. Node Ranking by Centrality 1.  Statistics -> Network Diameter -> Run 2.  Ranking -> Betweeness Centrality -> Apply
  • 26. Erdös Number •  You may have noticed a key node which both has the highest degree and betweeness ranking. •  Click on the “Edit” button and select that node (note the name) •  Statistics -> Erdös Number -> Select that name -> OK •  What will happen if you select a less conspicuous node?
  • 27. Data Lab •  Go to “Data Laboratory” •  All node information as well as calculated statistics appear here in a spreadsheet. •  Sort by “Erdös Number” (descending) •  What is the largest Erdös Number? N degrees of ________ . •  Try sorting by other values (degree, closeness, betweeness) Max is 7 degrees of separation
  • 28. Node Ranking by Eigenvector Centrality 1.  Statistics -> Eigenvector Centrality -> Run 2.  Ranking -> Eigenvector Centrality -> Apply
  • 29. Node Ranking by PageRank 1.  Statistics -> PageRank -> Run 2.  Ranking -> PageRank -> Apply
  • 30. Export to Image •  Go to “Preview” mode •  Click “Refresh” to see what you have now •  Add node labels •  “Node Labels” -> “Show Labels” •  Adjust font size to avoid label overlapping •  If Node Labels are overlapping, try expanding layout •  Back to “Overview” -> Layout -> Fruchterman Reingold •  Increase the “Area” parameter and re-run the layout •  Then go back to “Preview” mode and click “Refresh” •  May need to re-adjust Node Label text size •  Experiment with “Curved” edges
  • 31. labels omitted in slidedeck for privacy
  • 32. Before we attack the network, save!
  • 33. Network Resiliency •  How can we fragment the network or increase the separation between nodes? •  Which nodes, if removed/influenced, would most greatly impact the network? •  What information have we learned already that could be used?
  • 34. Network Resiliency •  Go to “Data Laboratory” -> sort by “PageRank descending •  Select top 5 rows and delete them (did you save first!!!) •  Note their names – Are these people influential in your life? sort Top 5
  • 35. Network Resiliency •  Go back to statistics and note the following: •  Average Degree, Network Diameter, Modularity, Connected Components, Average Path Length •  Also note how the network visually has changed •  Re-run the statistics above and note how the numbers changed •  Did you successfully fragment the network (did # of connected components increase)? (disrupting communications) •  How many nodes do you think you’d have to remove if you removed by lowest PageRank scores first? (robustness of network) •  What if links represented load distributed across network? How would the network load change after removing these key nodes? (cascading failure)
  • 36. Review •  Network Analysis – Crash Course •  Degree •  Components •  Modularity •  Ranking •  Resiliency •  Gephi – Intro •  Loading data (Facebook) •  Navigation •  Statistics •  Exporting •  Filtering •  Resiliency