Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Storyfying your Data: How to go from Data to Insights to Stories

Gramener's Director - Client success, Shravan Kumar A, delivered an online session to the students of Praxis Business School.

In his session he talked about how converting data into stories can benefit businesses and enable quick decision making. Furthermore, he shared approaches to create data stories along with some use cases and case studies we solved at Gramener to benefit our clients.

Check out our initiative to teach data storytelling to data scientists and analysts so that they can think out of the box and create wonderful data stories for their stakeholders:

  • Be the first to comment

  • Be the first to like this

Storyfying your Data: How to go from Data to Insights to Stories

  1. 1. “Story”fying your Data : How to go from Data to Insights to Stories Shravan KumarDSS, Sept 14th, 2020 How to make yourself Indispensable in your Career with Data
  2. 2. How a nurse changed the course of a war using data storytelling
  3. 3. Nightingale, helped curtail the death rate from a whopping 40% to a mere 2% 3 Created by Florence Nightingale for Queen Victoria during England’s war with France. Visualizes deaths due to: Red: War wounds Black: Other war-related causes Blue: Avoidable hospital diseases
  4. 4. 4 INTRODUCTION Shravan Kumar A Director, Client Success “Simplify Data Science for all”100+ Clients Insights as Stories Help start, apply and adopt Data Science @sh_ra_van /shravankumara
  5. 5. Introduction to Data Portraits 5
  6. 6. How to Create a Data Portrait 6
  7. 7. 7Source: McKinsey – COVID-19 Briefing materials COVID-19 Impact on Industries – A Perspective
  8. 8. 8Source: McKinsey – COVID-19 Briefing materials COVID-19 Impact on Industries – A Perspective
  9. 9. 9 Companies are working to minimize COVID-19 impact and build resilience 1 Source: BCG Covid-19 report, Apr 2, 2020 2 Source: McKinsey - How CDOs can navigate COVID-19 response, Apr 2020 COVID-19 has disrupted every industry. All sectors display an element of fragility and are susceptible to shock.2 Industries at the forefront of the crisis are relying on data to inform their response and rebound strategies. McKinsey1 suggests three waves of data- driven actions that organizations can take: 1. Ensure data teams – and the whole organization remain operational. 2. Lead solutions to prepare for the crisis- triggered challenges. 3. Prepare for the next normal and get ready to execute the plans. The effects of the outbreak aren’t going away quickly. This realization has settled in.
  11. 11. 11 Senior Data ScientistPrincipal AI StorytellerChief Data Wizard FEELING LUCKY? HERE’S A DATA SCIENCE TITLE GENERATOR! Data Statistical ML AI Chief Principal Senior Junior Associate Deputy Assistant Scientist Engineer Analyst Designer Developer Designer Storyteller Ninja Chef Wrangler Evangelist Rock Star Wizard Alchemist Vanity keywords Areas Activities
  13. 13. 13 THE JOURNEY FROM DATA TO DECISIONS Data Engineering MaturityPhases Data Science Data as ‘Culture’ Data Collection Data Storage Data Transformation Reporting Insights Consumption Decisions Source: Article – When and how to build out your data science team
  14. 14. 14 THE JOURNEY FROM DATA TO DECISIONS Data Engineering Data Science Data Collection Data Storage Data Transformatio n Reporting Insights Consumption MaturityPhases Source: Article – When and how to build out your data science team Data as ‘Culture’ Decisions
  15. 15. 15 REPORTING: DESCRIPTIVE SUMMARIES 2019 Boston Chicago Detroit New York Month Price Sales Price Sales Price Sales Price Sales Jan 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 Feb 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 Mar 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 Apr 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 May 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 Jun 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 Jul 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 Aug 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 Sep 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 Oct 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 Nov 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 Average 9.0 7.50 9.0 7.50 9.0 7.50 9.0 7.50 Variance 10.0 3.75 10.0 3.75 10.0 3.75 10.0 3.75 Revenue numbers from four Cities
  16. 16. 16 INSIGHT: PREDICTING TELCO CUSTOMER CHURN Tenure (months) 0 - 12 36+12-36 Data Usage > 1.5 GB 01 YN Bill > $65 0 N Y • Simple Decision-tree model offered ~30% reduction in churn • Advanced black-box models offered ~50%, but with low explainability 0Low Risk 1 High Risk Source: Gramener
  17. 17. 17 CONSUMPTION: WHEN ARE PEOPLE BORN IN THE US? Source:, conceptions might happen here Very high births.. Love the Valentine’s? Too busy holidaying? Avoid April Fool’s Day? Unlucky 13th? More births Fewer births
  18. 18. 18 More births CONSUMPTION: WHAT’S THE BIRTH PATTERN IN INDIA? Source: Fewer births Most births in the first half A striking birth pattern seen on the 5th, 10th, 15th, 20th and 25th of each month… Very low births Aug onwards Why? Birthdates are ‘changed’ to aid early school admissions .. this is a typical indication of fraud!
  19. 19. This adversely impacts children’s marks It’s a well-established fact that older children tend to do better at school in most activities. Since many children have had their birth dates brought forward, these younger children suffer. The average marks of children “born” on the 1st, 5th, 10th, 15th etc.. of the month tend to score lower marks. • Are holidays avoided for births? • Which months have a higher propensity for births, and why? • Are there any patterns not found in the US data? Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013) Children “born” on round numbered days score lower marks on average, due to a higher proportion of younger children
  20. 20. Class Xth English Marks Distribution 0 5,000 10,000 15,000 20,000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
  21. 21. Stories have four types of narratives to explain visualizations Remember “SEAR”: Summarize, Explain, Annotate, Recommend 21 0 5,000 10,000 15,000 20,000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Marks # students Teachers add marks to stop some students from failing This chart shows Class 10 students’ English marks in Tamil Nadu, India, in 2011. The X-axis has the mark a student has scored. The Y-axis has the # of students who scored that mark. Large number of students score exactly 35 marks Few (but not 0) students fail at 31-34 marks What’s unusual Large number of students score 35 marks. Few (but not 0) students score between 30-35 Only some students get this benefit. Identify a fair policy that will be applied consistently. Summarize the visual in its title Don’t describe the chart. Don’t write the user’s question. Write the answer itself. Like a headline. Explain & interpret the visual How should the user read it? What do you say when you talk through it? Explain what the visual is. Then the axes. Then its contents. Then the inference. Recommend an action How should I act on this? You need to change the audience. (Otherwise, you made no difference.) Annotate essential elements What should the user focus their eyes on? Point it out, or highlight it with colors Interpret what they’re seeing – in words. This is a bell curve. But the spike at 35 (the mark at which students pass) is unusual. Teachers must be adding marks to some of the students who are likely to fail by a small margin. No one scores 0-4 marks
  22. 22. An energy utility detected billing fraud This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large number of readings are aligned with the slab boundaries. Below is a simple histogram (or frequency distribution) of usage levels. Each bar represents the number of customers with a customers with a specific bill amount (in units, or KWh). Tariffs are based on the usage slab. Someone with 101 units is billed in full at a higher tariff than someone with 100 units. So people have a strong incentive to stay at or within a slab boundary. An energy utility (with over 50 million subscribers) had 10 years worth of customer billing data available. Most fraud detection software failed to load the data, and sampled data revealed little or no insight. This can happen in one of two ways. First, people may be monitoring their usage very carefully, and turn off their lights and fans the instant their usage hits the slab boundary. Or, more realistically, there’s probably some level of corruption involved, where customers pay a small sum to the meter reading staff to ensure that it stays exactly at the slab boundary, giving them the advantage of a lower price.
  24. 24. 24 INSIGHT + CONSUMPTION: DATA STORIES FROM THE WORLD BANK Source: World bank storytelling, by Gramener
  25. 25. 25 DATA & AI CAN SAVE LIVES TOO The Story of Marikina City, Philippines Link • Highly urbanized city situated on the river basin of Marikina • Faced with huge flood hazard levels. Better & resilient infrastructure planning needed • How can Urban planners plan for better emergency evac & rescue? • Can AI be applied to solve this problem? If applied, how can the urban planner understand it?
  27. 27. Data stories through Comicgen An e.g. CoVID-19 Data Explained by Data Comics Link
  28. 28. Comic character in a data callout:
  29. 29. Samuel L. Jackson Harrison Ford Morgan Freeman Tom Hanks Tom Cruise
  30. 30. Insights and Story telling approach 30 Stage 1- Identify Business Problem Define the problem statement by understanding: • What is the basic need and desired outcome? • Who will benefit? • What is the impact? • What is the success criteria? Stage 2- Translate to Data Problem • Breakdown the problem statement into multiple use- cases • Connect each use case with a data set • Understand any limitations on data sources- Internal and External? Stage 4- Translate to Business Answer • Stitch insights from individual use case to create a story • Connect data story to help in better decision making • Measure success Stage 3- Data Answer Target each use case with data through: • EDA and transformation • Modelling • Generating insights • Sales Rep • Data Consultant • Account Manager • Solution Lead • Analyst Lead • Data Consultant • Account Manager • Solution Architect • Solution Lead • Analyst Lead • Data Consultant • Data Scientist • Solution Architect • Solution Lead • Data Consultant • Account Manager • Solution Lead
  31. 31. In summary, here are the 9 steps to go from data to a data story 31 Who is your audience? They determine the story What is their problem? That defines your analysis Find the right analysis to solve the problem Filter for big, useful, surprising insights Start with the takeaway. Summarize your entire story Add supporting analyses as a tree Pick a format based on how your audience will consume the story Pick a visual design based on the takeaway Annotate to explain & engage. Use four types of narratives
  33. 33. 33 1. Most Data Science projects solve the wrong Problem.. Tip #1: Master the application of knowledge
  34. 34. 34 AI IS COMING FOR THE DATA SCIENCE JOBS AI and automation will do away with most of the grunt work in the data science workflow today. Applied knowledge will keep you relevant for much longer.
  35. 35. Wolbachia blocks dengue, Zika and chikungunya virus transmission
  36. 36. Wolbachia mosquito releases Adults Eggs Community
  37. 37. Model design 20,000 ppl / km2 15,000 ppl / km2 Identify where people live Detect buildings Estimate human population density 100m2 grids e.g.
  38. 38. Site scoping • Set boundary of potential release area • Identify the areas where people live • Map mosquito release points over area with a grid • Organise release area into stages
  39. 39. 39 2. Data Analytics needs a lot more than Data & Analytics.. Tip #2: Learn non-core skills
  40. 40. 40 DATA SCIENCE SOLUTION: LET’S TAKE THIS EXAMPLE.. Source: World bank storytelling, by Gramener
  41. 41. 41 ..AND BREAK IT DOWN INTO THE BUILDING BLOCKS Domain Design Analytics Development • Impact analytics • Clustering techniques • Business workflow • Influencing factors • Frontend/backend coding • Data transformation • User journey • Visuals & aesthetics Project Management • Piecing it all together • Change management
  42. 42. 42 HERE ARE THE 5 ROLES & SKILLS CRITICAL FOR DATA SCIENCE Data Translator ML Engineer Information Designer Data Scientist Data Science Manager Comic characters from Gramener Comicgen library Domain Design Analytics Development Project Management • Domain expertise • Business analysis • Solutioning • Software engineering • Front/back-end coding • Data pipelining • Information design • User centered design • Interface/visual design (parts) • Stats & ML • Interpret insights • Scripting skills • Project management • Business analysis/solutioning • Team handling
  43. 43. 43 3. Data cleaning takes up a majority of time on projects.. Tip #3: Sharpen ability to handle data
  44. 44. 44 In data science, 80% of the time is spent preparing data, and the other 20% on complaining about preparing the data! - Kirk Borne “
  45. 45. 45 4. Technology goes obsolete faster in Data Science.. Tip #4: Learn new tools quickly
  46. 46. 46 WHAT DOES THE DATA TOOLS LANDSCAPE LOOK LIKE? The tool does not matter. A person’s skill with the tool does. Pick an ability to learn new tools rapidly Source:
  47. 47. 47 EXAMPLE: WHAT ARE YOUR TOOL OPTIONS TO VISUALIZE DATA? Code-based Plug-n- play Flexibility Complexity Google Data Studio Excel Google Sheets Tableau Raw Vismio Datawrapper Timeline JS Polestar Vega Vega-lite d3, matplotlib C3 High charts Nvd3 Gramex ggplot, bokeh Plotly Choose tools based on flexibility, your background and tool availability
  48. 48. 48 Tip #4: Learn new tools quickly Tip #2: Learn non-core skills Tip #3: Sharpen ability to handle data Tip #1: Master the application of knowledge
  50. 50. 50 WHAT DOES THE RECESSION MEAN FOR JOBS IN DATA SCIENCE? Source: McKinsey report – Lives and Livelihoods Data jobs and specialized professions are relatively less impacted Industries with the lowest wages and lowest educational attainment are hit the hardest
  51. 51. 51 HERE’S WHY DATA IS KEY FOR COVID-19 AND THE RECESSION Enterprises B Community C Remote workforce & collaboration Market demand & Cash flows1 2 Supply chain & Logistics3 Identifying vulnerability and contact-tracing Tracking the COVID-19 patient lifecycle1 2 Predicting infection rates and spread2 Public Health A Understand behavioral shifts Mapping the effectiveness of shutdown1 2 Address people concerns during Covid-193 Source: Gramener – NYC 311 analysisSource: Kinsa Health weather map Source: Gramener – Supply Chain flow
  52. 52. 52 HOW DO YOU STAY RELEVANT AND GROW IN YOUR CAREER PATH? Do your own data projects Read/Write on data science Maintain a public portfolio Compete, learn & re-apply Source: Article – How to demonstrate your passion for Data
  53. 53. 53 @sh_ra_van /shravankumara Please help me improve the session by answering the feedback survey that will be sent to your email  THANK YOU! GRACIAS! MERCI!