This document provides an overview of a post-academic course on Big Data taught by Joris Klerkx. It discusses the Augment group's mission to augment human intellect through tools and technologies. Their research focuses on capturing physiological and behavioral signals through sensors to create meaningful feedback loops. Application domains discussed include technology-enhanced learning, media consumption, science, and health. Guidelines for visualizing big data emphasize using interactive visual encodings to promote recognition over recall by humans. Interactivity, overview first approaches, and checking data quality are advised.
1. Post-academic course Big Data
Post-academic course Big Data
Joris Klerkx
Research Manager, PhD.
joris.klerkx@cs.kuleuven.be
Visualisatie
Big Data
IVPV - Instituut voor PermanenteVorming
28-05-2015
1
2. Augment group - HCI research lab
Dept. Computerwetenschappen
KU Leuven
https://augmenthuman.wordpress.com
2
6. What are relevant user
actions?
How can we capture signals?
How can we store them?
How can we create a
meaningful feedback loop?
Our Research
Physiological, behavioural signals
Sensors, (self-)trackers
Information visualization
Scalable infrastructure
6
20. "The idea that business is strictly a numbers affair has always struck me as preposterous.
For one thing, I’ve never been particularly good at numbers, but I think I’ve done a
reasonable job with feelings. And I’m convinced that it is feelings — and feelings alone —
that account for the success of the Virgin brand in all of its myriad forms.” -- Richard
Branson
20
34. Scientific visualisation
Specifically concerned with data that has a well-defined representation in 2D or 3D space (e.g., from
simulation mesh or scanner).
Slide source: Robert Putman 34
41. The Role of visualisation
Brehmer, M.; Munzner, T., "A Multi-Level Typology of Abstract Visualization Tasks," Visualization
and Computer Graphics, IEEE Transactions on , vol.19, no.12, pp.2376,2385, Dec. 201341
53. Cluttered displays
Binned density scatterplot
Hexagonal instead of rectangular
Heer, J. & Kandel, S. (2012), Interactive Analysis of Big Data, XRDS, 19 (1)53
55. http://www.perceptualedge.com/blog/?p=2046
In this day of so-called Big Data,
organizations are scrambling to
implement new software and
hardware to increase the amount of
data that they collect and store.
In so doing they are unwittingly
making it harder to find the needles of
useful information in the rapidly
growing mounds of hay.
If you don’t know how to
differentiate signals from noise,
adding more noise only makes
matters worse.
55
57. Visualizations might help reveal multidimensional patterns
Use the power of the machine to find a proxy in the data that
predicts the selected variables
Depending on their specific questions, domain experts might
select a subset of variables they are interested in
57
66. World Population Growth
A tremendous change occurred with the industrial revolution: whereas it had taken all of human history until
around 1800 for world population to reach one billion, the second billion was achieved in only 130 years
(1930), the third billion in less than 30 years (1959), the fourth billion in 15 years (1974), and the fifth billion in
only 13 years (1987). During the 20th century alone, the population in the world has grown from 1.65
billion to 6 billion.
Seeing is understanding
66
70. T. Nagel, M. Maitan, E. Duval,A.Vande Moere, J. Klerkx, K. Kloeckl, and C. Ratti.Touching transport - a case study on visualizing metropolitan public
transit on interactive tabletops. In AVI2014: 12th ACM International Working Conference on AdvancedVisual Interfaces, pages 281–288, 2014.
http://www.youtube.com/watch?v=wQpTM7ASc-w
Facilitates human interaction for exploration and understanding
70
71. Will there be enough food?
http://www.footprintnetwork.org/en/index.php/gfn/page/earth_overshoot_day/
Communicates insights easily
71Triggers Impact
79. Visualizing Reader Activity
Elk vierkant is een ‘slide’
Elke rij stelt een
navigatie-patroon voor
doorheen de slides
Kolom 1 toont absoluut
aantal lezers
Kolom 2 toont het
percentage lezers
79
80. 262 readers (2.7%) gaan volledig door alle slides, waarna
ze snel teruggaan naar de eerste slide om die nog even
te bekijken.
Lezerstijd per slide
Lezers spenderen +/- 75 seconden (avg) op de eerste slide
om te bestuderen welke informatie voorhanden is.
80
Shows patterns
81. Sentiment analysis in enterprise social network (slack)
Triggers questions & creates awareness
Disclaimer: Should we trust NLP-algorithms?
81
82. Empowers users to make informed decisions
Positive Badges
Negative Badges
82
83. Show errors in the data
http://woutervds.github.io/InfoVisPostgraduwhat/83
94. Humans have advanced perceptual abilities
Humans have little short term memory
Our brains makes us extremely good at recognizing visual patterns
Our brains remember relatively little of what we perceive
Externalize data by using interactive, visual encodings
Promote recognition rather than recall
94
99. “It’s not a magical algorithm
that finds the insight for you”
“You have to look at the overview,
you have to decide what you zoom
in to, what you filter out. And then
you click to get the details”
Ben Shneiderman, 201199
104. Real data is ugly and needs to be cleaned
http://hcil2.cs.umd.edu/trs/2011-34/2011-34.pdf
http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation
https://code.google.com/p/google-refine/
http://vis.stanford.edu/wrangler/Pre-process your data
104
108. Use small coordinated graphs to add variables
108
Forget about 3D graphs
Source: Stephen Few
109. Which student has more blogposts?
• Size & angle are difficult to compare
• Without labels & legends, impossible to show exact quantitative
differences
• Limited Short term (visual) memory
109
110. Source: Stephen Few
Save the pies for dessert (S. Few)
Try using either of the pies to put the slices in order by size
110
114. 0" 10" 20" 30" 40" 50" 60"
Student"1"
Student"2"
Student"3"
Student"4"
blogposts"
tweets"
comments"on"blogs"
reports"submi:ed"
0%# 20%# 40%# 60%# 80%# 100%#
Student#1#
Student#2#
Student#3#
Student#4#
blogposts#
tweets#
comments#on#blogs#
reports#submi;ed#
Use Common Sense
What are you comparing?
What story do you get from it?
114
115. Which graph makes it easier to focus on the pattern of change
through time, instead of the individual values?
Choose graph that answers your questions about your data
115Source: Stephen Few
124. How much better are the drinking water conditions in Willowtown as
compared to Silvatown?
124
http://fellinlovewithdata.com/research/deceptive-visualizations
131. A limited set of visual properties that are detected
- very rapidly (< 200 to 250 ms),
- accurately,
- with little effort,
- before focused attention
by the low-lever visual system on them.
Healey, C., & Enns, J. (2012). ADenEon and Visual Memory in VisualizaEon and Computer Graphics. IEEE Transac+ons on Visualiza+on
and Computer Graphics , 18 (7), 1170-1188.
Pre-attentive characteristics
Note that eye movements take at least 200 ms to initiate.
131
132. Pre-attentive characteristics
Find the red dot
<> Hue
Find the dot
<> shape
Find the red dot
conjunction
not pre-attentive
http://www.csc.ncsu.edu/faculty/healey/PP/
helps to spot differences in multi-element display
132
133. Pre-attentive characteristics
Line orientation Length, width Closure Size
Curvature Density, contrast Intersection 3D depth
Not all of them allow showing exact quantitative differences
Helps to spot differences in multi-element display
133
http://www.csc.ncsu.edu/faculty/healey/PP/
138. Common Fate
Objects with a common movement, that move in the same
direction, at the same pace, at the same time are organised as a
group (Ehrenstein, 2004).
138
139. Law of Isomorphism
Is similarity that can be behavioural or perceptual, and can
be a response based on the viewers previous experiences
(Luchins & Luchins, 1999; Chang, 2002).This law is the basis
for symbolism (Schamber, 1986).
139
142. B. McDonnel and N. Elmqvist. Towards utilizing gpus in information visualization:A model and implementation of
image-space operations.Visualization and Computer Graphics, IEEE Transactions on, 15(6):1105–1112, 2009.
http://www.infovis-wiki.net/index.php/Visualization_Pipeline
142
144. Data
- structure
time, hierarchy, network, 1D, 2D, nD, …
- questions
where, when, how often, …
- audience
domain & visualisation expertise, …
144
145. S. Stevens. On the theory of scales of measurement. Science, 103(2684), 1946.
Structure
Time? hierarchical? 1D? 2D? nD? network? …
145
146. Questions (to get things going)
What is the average amount of students that bought the course book ?
What? When? How much? How often?
When did students start looking at the course material?
How much hours did Peter work on this assignment?
(Why did Peter have to redo his assignment?)
How often did Peter retake the course before he passed?
(why?)
146
147. 147
Visual mapping
Encode data characteristics into visual form
Each mark (point, line, area,…) represents a data element
Think about relationships between elements (position)
“Simplicity is the ultimate sophistication.”
Leonardo daVinci
153. Which one looks more accurate?
Slide adapted from Michael Porath
153
Compensating magnitude to match perception
154. Color
Color Principles - Hue, Saturation, andValue
https://www.youtube.com/watch?v=l8_fZPHasdo154
Use maximum +/- 5 colors (for categories,.. ) (short term memory)
http://en.wikipedia.org/wiki/HSL_and_HSV
155. • hue: categorical
• saturation: ordinal and quantitative
• luminance/brightness:
ordinal and quantitative
How to choose colors
source from: Katrien Verbert 155
165. Offer precise controls for sharing on the Internet...
Users should navigate through 50 settings with more than 170 options
Example
Facebook privacy statement
Questions?
How did its complexity change over time?
How does its length compare to privacy statements
of other tools?
165
166. How did its complexity change over time?
http://www.nytimes.com/interactive/2010/05/12/business/facebook-privacy.html
166
167. How does its length compare to privacy statements
of other tools?
http://www.nytimes.com/interactive/2010/05/12/business/facebook-privacy.html
167