Rasagy Sharma, Gramener's Principal Information Designer, created this PPT to introduce data storytelling to the students at Symbiosis Institute of Business Management (SIBM).
The slide talks about how to create data stories and the impact of data stories on businesses.
Rasagy Sharma also teaches data storytelling to analysts and data scientists. Check out more about Gramener's data storytelling workshop at https://gramener.com/data-storytelling-workshop
Check out more solutions on data storytelling at https://gramener.com
Connect with Rasagy on:
LinkedIn: https://in.linkedin.com/in/rasagy
Twitter: @rasagy
11. Visually representing data helps us to see patterns in the data quickly
āThe greatest value of a picture is
when it forces us to notice
what we never expected to see.ā
ā John Tukey
11Datasauras dataset, animated by Autodesk Research
12. Stories have a huge impact on humans
13
Storytelling has a 30X Return on Investment
Rob Walker and Joshua Glenn auctioned common items
like mugs, golf balls, toys, etc. The item descriptions were
stories purpose-written by 200+ contributing writers.
Items that were bought for $250 sold for over $8,000 ā a
return of over 3,000% for storytelling!
Stories are memorable and viral
People remember stories. Theyāll act on them.
People share stories. That enables collective action.
We analyze data to improve peopleās decision making.
For this to be effective, data stories are needed more than
ever before.
13. Visual data storytelling is a critical skill for data scientists, analysts & managers
14
Share your data & analysis as data stories
Whenever you share inferences from data ā whether
itās as a presentation, or an email or document with
your analysis, or as a dashboard ā craft it as a story.
This session will give you a glimpse of some of the
data stories weāve created at Gramener, and how you
can make these yourself.
But analysts present their work, not their message
Data scientists present their analysis ā what they did,
and what they found. Thatās not what the audience
needs.
Audiences need a message that tells them what to do,
and why. Told in an engaging way. As a story.
15. ā¦but the overload of data in todayās age makes this critical
āEvery second of every
day, our senses bring in
way too much data than
we can possibly process
in our brains.ā
ā Peter Diamandis
Data Storytelling helps make sense of this data
16Data never sleeps Infographic by Domo
16. This is a dataset (1975 ā 1990) that
has been around for several years and
has been studied extensively. Yet, a
visualization can reveal patterns that
are neither obvious nor well known.
For example,
ā¢ Are birthdays uniformly distributed?
ā¢ Do doctors or parents exercise the C-section option to move dates?
ā¢ Is there any day of the month that has unusually high or low births?
ā¢ Are there any months with relatively high or low births?
Very high births in September.
But this is fairly well known. Most
conceptions happen during the
winter holiday season
Relatively few births during the
Christmas and Thanksgiving
holidays, as well as New Year
and Independence Day.
Most people prefer not
to have children on the
13th
of any month, given
that itās an unlucky day
Some special days like April
Foolās day are avoided, but
Valentineās Day is quite
popular
More births Fewer births ā¦ on average, for each day of the year (from 1975 to 1990)
Letās look at 15 years of US Birth Data
https://gramener.com/posters/Birthdays.pdf
17. The pattern in India is quite different
https://gramener.com/posters/Birthdays.pdf
This is a birth date dataset thatās
obtained from school admission data
for over 10 million children. When we
compare this with births in the US, we
see none of the same patterns.
For example,
ā¢ Is there an aversion to the 13th
or is there a local cultural nuance?
ā¢ Are holidays avoided for births?
ā¢ Which months have a higher propensity for births, and why?
ā¢ Are there any patterns not found in the US data?
Very few children are born in the
month of August, and thereafter.
Most births are concentrated in
the first half of the year
We see a large number of
children born on the 5th, 10th,
15th
, 20th
and 25th
of each month
ā that is, round numbered dates
Such round numbered patterns a
typical indication of fraud. Here,
birthdates are brought forward to
aid early school admission
More births Fewer births ā¦ on average, for each day of the year (from 2007 to 2013)
18. This adversely impacts childrenās marks
https://gramener.com/posters/Birthdays.pdf
Itās a well-established fact that older
children tend to do better at school in
most activities. Since many children
have had their birth dates brought
forward, these younger children suffer.
The average marks of children ābornā on the 1st
, 5th
, 10th
, 15th
etc.. of
the month tend to score lower marks.
ā¢ Are holidays avoided for births?
ā¢ Which months have a higher propensity for births, and why?
ā¢ Are there any patterns not found in the US data?
Higher marks Lower marks ā¦ on average, for children born on a given day of the year (from 2007 to 2013)
Children ābornā on round numbered days score lower marks on average,
due to a higher proportion of younger children
20. Seeing the best & the worst of the best
21
Sachin
R Tendulkar
Sourav
C Ganguly
Rahul
Dravid
Mohammad
Azharuddin
Yuvraj
Singh
Virender
Sehwag
Mahendra
S Dhoni
Alaysinhji
D Jadeja
Navjot
S Sidhu
Gautam
Gambhir
Krishnamachari
Srikkanth
Kapil
Dev
Dillip
B Vengsarkar
Suresh
K Raina
Ravishankar
J Shastri
Sunil
M Gavaskar
Mohammad
Kaif
Virat
Kohli
Vinod
G Kambli
Vangipurappu V
S Laxman
Rabindra
R Singh
Sanjay
V Manjrekar
Mohinder
Amarnath
Manoj
M Prabhakar
Rohit
G Sharma
Irfan
K Pathan
Nayan
R Mongia
Ajit
B Agarkar
Dinesh
Mongia
Harbhajan
Singh
Krishna K
D Karthik
Sandeep
M Patil
Anil
Kumble
Yashpal
Sharma
Javagal
Srinath
Hemang
K Badani
Yusuf
K Pathan
Robin
V Uthappa
Raman
Lamba
Zaheer
Khan
Ravindra
A Jadeja
Pathiv
A Patel
Sadagopan
Ramesh
Roger M
H Binny
Woorkeri
V Raman
Sunil
B Joshi
Kiran
S More
Praveen
K Amre
Ashok
Malhotra
Chetan
Sharma
21. Data stories can help communicate the data science process
https://gramener.com/cluster/cluster-census-2011-district
Poor
Rural, uneducated agri
workers. Young population
with low income and asset
ownership. Mostly in Bihar,
Jharkhand, UP, MP.
Breakout
Rural, educated agri workers
poised for skilled labor.
Higher asset ownership.
Parts of UP, Bihar, MP.
Aspirant
Regions with skilled labor
pools but low purchasing
power. Cusp of economic
development. Mostly WB,
Odisha, parts of UP
Owner
Regions with unskilled labor
but high economic prosperity
(landlords, etc..) Mostly AP,
TN, parts of Karnataka,
Gujarat
Business
Lower education but working
in skilled jobs, and
prosperous. Typical of
business communities. Parts
of Gujarat, TN, Urban UP,
Punjab, etc.
Rich
Urban educated
population
working in skilled
jobs. All metros,
large cities, parts
of Kerala, TN
Skilled
Poorer Richer
Unskilled Skilled
Uneducated Educated Uneducated Educated
Unskilled
Purchasing power
Skilled jobs
Education
Poor Breakout Aspirant Owner Business RichThe 6 clusters are
Previously, the client was treating contiguous regions as a
homogenous entity, from a channel content perspective.
To deliver targeted content, we divided India into 6 clusters based
on their demographic behavior. Specifically, three composite
indices were created based on the economic development
lifecycle:
ā¢ Education (literacy, higher education) that leads to...
ā¢ Skilled jobs (in mfg. or services) that leads to...
ā¢ Purchasing power (higher income, asset ownership)
Districts were divided (at the average cut-off) by:
22. Data stories can bring a new perspective to solve a problem
IT & Services team of a large
multinational wanted to
understand the status & driver
of delays.
Gramener visualized various
stages of service requests &
represented how long they are
stuck at each stage.
The problem emerged to be
rework, not efficiency.
23https://gramener.com/servicerequests/
Navigation filters
Process flow diagram
indicating bottlenecks &
volume of requests
Automated analysis to
identify areas which
need work and which
can create maximum
impact
23. Data stories can lead to higher engagement & answer new questions
A large FMCG organization wanted to create a
visualization to review sales performance across
geographies, channels and products for their
board meetings.
Gramener built an interactive slide deck that
allowed users to drill-down within powerpoint.
Dynamic presentations led to a complete revamp
of the entire structure of board presentations.
24https://gramener.com/fmcg/
Worldwide:
$288 mn UK: 87.0
Stores: 34.4
Product 9: 6.2
Product 10: 5.4
Product 7: 5.1
Product 15: 4.8
Product 8: 3.1
Product 14: 2.1
Partners: 29.2
Product 15: 6.7Product 17: 4.1Product 6: 3.4Product 1: 3.2Product 7: 2.9Product 11: 2.4
Direct: 23.5 Product 17: 5.2
Product 8: 4.4
Product 16: 4.0
Product 14: 2.5
Product 1: 2.5
Japan: 71.9
Stores: 25.9
Product 14: 6.0
Product 7: 5.4
Product 11: 4.0
Product 17: 2.8
Partners: 25.5
Product 8: 8.2
Product 11: 3.6
Product 16: 3.3
Product 1: 3.1
Product 9: 2.0
Direct:20.5
Product 11: 5.2
Product15:4.5
P
roduct14:2.8
Product9:2.3
China:65.6
Partners:27.3
Product10:8.0
Product3:7.1
Product15:3.0
Product2:2.1
Pro
duct8:2.0
Direct:19.6
Product3:5.5
Product2:4.7
Product8:2.6
Product17:2.1
Stores:18.7
Product10:5.4
Product14:2.2
Product7:2.1
Product15:2.0
India:46.6
Stores:17.5
Product16:6.8
Direct:15.6
Product10:3.4
Product16:2.9
Product17:2.5
Product7:2.4
Partners:13.4
Product8:2.5
Product7:2.3
US:17.0
Partners:6.0
Product10:4.4
Direct:5.8Product11:3.9
Stores:5.3Product11:3.8
Worldwide$288.0mn
A: Accelerate$68.9mn
B: Build$77.2mn
C: Cut down$141.9mn
24. Interactive data stories with comics can turn your analysis into a fun quiz
https://blog.gramener.com/data-comics-storytelling-for-business-decisions/
25. Data stories can show you what you canāt see
26http://rasagy.in/500days/
26. Data stories can show you what you canāt see
27http://rasagy.in/500days/
27. Data stories can spark questions that you may have never asked
28http://rasagy.in/VisualizingTrains/
29. You have data.
You have analysis.
Now narrate your story.
Understand the audience & intent
Find insights
Storyline
Design your data stories
30. How the data should be interpreted decides the type of chart to be used
31
Deviation
Change-
over-Time
Spatial Ranking
Correlation
Part-to-
Whole
Flow
Magnitude
Distribution
31. How the data should be interpreted decides the type of chart to be used
32
Deviation
Emphasise variations (+/-) from a fixed
reference point. Typically the reference point
is zero but it can also be a target or a long-
term average.
Change-over-Time
Give emphasis to changing trends. These can
be short (intra-day) movements or extended
series traversing decades or centuries.
Spatial
Used only when precise locations or
geographical patterns in data are more
important to the reader than anything else.
Ranking
Use where an item's position in an ordered list
is more important than its absolute or relative
value.
Correlation
Show the relationship between two or more
variables.
Part-to-Whole
Show how a single entity can be broken down
into its component elements.
Flow
Show the reader volumes or intensity of
movement between two or more states or
conditions. These might be logical sequences
or geographical locations
Magnitude
Show size comparisons. These can be
relative (just being able to see larger/bigger)
or absolute (need to see fine differences).
Distribution
Show values in a dataset and how often they
occur. The shape (or skew) of a distribution
can be a memorable way of highlighting the
lack of uniformity or equality in the data.
Help people explore data Showcase high/low performance Explain drivers of performance
32. How the data should be interpreted decides the type of chart to be used
33
Help people explore data Showcase high/low performance Explain drivers of performance
https://gramener.github.io/visual-vocabulary-vega/
Deviation
Emphasise variations (+/-) from a fixed
reference point. Typically the reference point
is zero but it can also be a target or a long-
term average.
Change-over-Time
Give emphasis to changing trends. These can
be short (intra-day) movements or extended
series traversing decades or centuries.
Spatial
Used only when precise locations or
geographical patterns in data are more
important to the reader than anything else.
Ranking
Use where an item's position in an ordered list
is more important than its absolute or relative
value.
Correlation
Show the relationship between two or more
variables.
Part-to-Whole
Show how a single entity can be broken down
into its component elements.
Flow
Show the reader volumes or intensity of
movement between two or more states or
conditions. These might be logical sequences
or geographical locations
Magnitude
Show size comparisons. These can be
relative (just being able to see larger/bigger)
or absolute (need to see fine differences).
Distribution
Show values in a dataset and how often they
occur. The shape (or skew) of a distribution
can be a memorable way of highlighting the
lack of uniformity or equality in the data.
34. Recommended Resources
Books to read
ā¢ Resonate - Nancy Duarte
ā¢ Storytelling with Data - Cole
Nussbaumer Knaflic
ā¢ Truthful Art - Alberto Cairo
ā¢ Design of everyday things - Don
Norman
ā¢ Back of the napkin - Dan Roam
Data Storytelling at Gramener
1. Solutions on Gramener site:
https://gramener.com/solutions/
2. Gramenerās Blog
https://blog.gramener.com/
3. Gramenerās upcoming webinars:
https://linkedin.com/company/
gramener/
Tools to learn
ā¢ Paper & Pen (Collaborative)
ā¢ Excel
ā¢ Tableau & PowerBI
ā¢ JS (D3, Vega, Plot.ly)
ā¢ Python (Bokeh/Matplotlib)
ā¢ R (ggplot)
ā¢ Raw graphs
ā¢ Illustrator / Sketch / Figma
You can find me (Rasagy) on Twitter/LinkedIn/Instagram
35
35. āMost of us need to listen to the music
to understand how beautiful it is.
But often thatās how we present statistics:
we just show the notes,
we donāt play the music.ā
ā Hans Rosling
Introduction to Data Storytelling by Rasagy Sharma