Retail Store Scavanger Hunt - Foundation College Park
Exploring the Networks in Open Public Data
1. Exploring the Networks
in Open Public Data
Uldis Bojārs
Institute of Mathematics and Computer Science
University of Latvia
Using Open Data Workshop
Brussels, 20-Jun-2012
2. About us
• Institute of Mathematics and Computer
Science, University of Latvia
– http://www.lumii.lv/resource/show/170
– Uldis Bojārs @CaptSolo
– Valdis Krebs http://orgnet.com
– Pēteris Ručevskis
3. Network visualisation and analysis
Applications:
• discover interesting patterns
• explore data in [more] detail
Work from the Open Data Hackaton in Riga
• analysis of Saeima voting patterns
• http://opendata.lv
4. Overview
• Data needs to be Open
• Pre-processing and filtering the data
– selecting what to show
• Data visualization
– iterative process (visualize, refine, repeat)
• What’s next?
5. Open Data needed first (!)
“Open data is data that can be
freely used, reused and redistributed by anyone …”
http://opendefinition.org/
Data needs to be:
• open
• easy to use
Still a problem in Latvia:
• only a few datasets are open in
an easy-to-consume form (PDF does not count :)
7. Pre-processing
• Input:
– raw vote data (scraped from the website)
published at http://data.opendata.lv/
• Output:
– nodes (MPs)
– edges (connections between them)
• What is a connection?
8. Defining graph connections
• Connect MPs if they have voted similarly
– disagreed on at most n% of decisions
• Filter out cases where almost all
MPs voted the same
• Filter out trivial decisions
• Filter out noise
9. Node colour legend
• Ruling coalition:
– Zatler’s Reform Party
– Unity
– the National Alliance
• Opposition:
– Harmony Centre
– Greens / Farmers Party
• a few non-party MPs
10. MPs who always vote the same (n = 0%)
Connection criteria too narrow
11. MPs who disagree in less than 35% of cases
Connection criteria too broad
(everyone agrees, really?)
12. Refining the visualisation
• Need to find the right cut-off values (n%)
– where patterns [start to] appear
– and the visualisation makes sense
• Show the results to domain experts
– MPs, journalists, political researchers, …
• Experts:
– help improve visualisations
– can discover new things for themselves
13. MPs who disagree in less than 11% of cases
Opposition parties [sometimes] vote the same
14. MPs who disagree in less than 25% of cases
Bridges appear b/w position and opposition parties
(see slides 21, 22 re the bridging role of yellow nodes)
15. What next?
• Improve our understanding of data
• Enhance visualisations
– add clusters, etc.
• Create multiple visualisations
– different topics, changes in time, etc.
• Bring in more data
– explain nodes & edges
16. network
visualisation
example #1
Donations to political parties
http://www.thenetworkthinkers.com/2011/12/
innovation-happens-at-intersections.html
17. network
visualisation
example #2
Intra-company communication patterns
18. Conclusion
• Need more, useful Open Data
• Discovering patterns, making sense of data
– helping make sense = purpose of visualisations
• Looking forward to collaboration re:
– Using Open Data
– Data Visualisation and Analysis
19. More info
• Uldis Bojārs
uldis.bojars@gmail.com
• Social Network Analysis talk / Valdis Krebs
http://www.slideshare.net/DERIGalway/
valdis-krebs-social-network-analysis-19872007
• Smart Network Analyzer tool
http://sna.lumii.lv/
in development at IMCS, University of Latvia
Editor's Notes
the raw data not always immediately useful to wide public - using open data - discovering patterns - making sense of it
It’s worthwhile to explore networks that emerge from the data you’re looking atVarious kinds of networks: - people in companies (who communicates with whom) - MPs, based on co-voting patterns - companies (networks of)
Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike. - http://opendefinition.org/http://opendatahandbook.org/en/what-is-open-data/index.html
- scrape the data -make it open - clean up the data - transform the data - make it usable [for the purpose]how do we define an edge?
We want to choose those parts of data from which we can deduce something - simple procedural decisions are outChose voting instances where there were notable opinion differencesNoise = MPs who had votes only a few times (throws off %s)---Some votes are more important than others
Harmony CentreGreens/Farmers–choice: (a) join one of twoclusters; (b) isolation; (c) bridge between them
strong voting discipline in the Harmony Centre. majority of the rest do not vote the same (at this value of n%)
far opposition / near opposition / coalitionlooks prettydoesnot give much useful information - almost a full graph
does it look right at first sight? (the “sniff test”)show to domain expertspeople can make pretty graphs - but what do they mean? - what can we explain or show via them?
the Greens / Farmers party is bridging between the strong opposition party Harmony Centre and the ruling coalition - sometimes agree with the opposition, sometimes with the coalitionsee slides 21, 22 re “live animation” showing what happens if you take them off the graph
learned from experts: not everything appears as a vote; some votes are more important than others - more insights -> better visualisations (more truthful, etc.)some advanced visualisations will need more information - e.g., to define what laws are on what topicsbringing in more data - annotate nodes & edges with additional data / explanations of why this edge appears here - profiles for members of parliament (e.g., TheyWorkUs site in the UK) - linked data
another example of an open data graph visualisation
another view of this data: http://www.slideshare.net/DERIGalway/valdis-krebs-social-network-analysis-19872007/15The central red cluster corresponds to the company headquarters. Eachvertex in the network represents an employee, colored according to the locationthey work at. Graph edges denote frequent, confirmed, work-related communi-cations between employees. Cluster overlaps reveal which employees frequentlyinteract with other locations, serving as boundary-spanners. This visualizationhelps to identify key connectors in the company [0].
what do we do with thesevisualisations next? = how do we use them (to have impact, explain data, …)
social network visualisation & analysis allow to see what was previously invisible“Social Network Analysis” talk by Valdis Krebs - for more info re SNA and network visualization
demo how the Greens / Farmers party is bridging between the stong opposition Harmony Centre and the ruling coalition - sometimes agree with the opposition, sometimes with the coalition - (edge connection criteria n = 25%)
demo how the Greens / Farmers party is bridging between the stong opposition Harmony Centre and the ruling coalitionwhen the Greens / Farmers party nodes are hidden from the graph, there is no connection. - the coalition and the Harmony Centre do not vote the same