Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
MAKING BIG DATA RELEVANT 
THE IMPORTANCE OF DATA VISUALIZATION & ANALYTICS 
@sanand0 
S Anand, Chief Data Scientist, Grame...
A DATA VISUALISATION 
CHALLENGE… 
You will see 3 questions. 
You have 30 seconds. 
Try it! 
Your timer 
starts now
HOW MANY NUMBERS ARE ABOVE 100? 1 
23 32 71 72 58 87 11 77 70 16 
17 21 56 44 68 51 84 20 60 40 
37 8 107 14 12 41 69 14 1...
HOW MANY NUMBERS ARE BELOW 10? 2 
23 32 71 72 58 87 11 77 70 16 
17 21 56 44 68 51 84 20 60 40 
37 8 107 14 12 41 69 14 18...
WHICH QUADRANT HAS THE HIGHEST TOTAL? 
3 
23 32 71 72 58 87 11 77 70 16 
17 21 56 44 68 51 84 20 60 40 
37 8 107 14 12 41 ...
A DATA VISUALISATION 
CHALLENGE… 
We’ll answer the same questions again. 
But with simple visual cues. 
See how long it ta...
HOW MANY NUMBERS ARE ABOVE 100? 1 
23 32 71 72 58 87 11 77 70 16 
17 21 56 44 68 51 84 20 60 40 
37 8 107 14 12 41 69 14 1...
HOW MANY NUMBERS ARE BELOW 10? 2 
23 32 71 72 58 87 11 77 70 16 
17 21 56 44 68 51 84 20 60 40 
37 8 107 14 12 41 69 14 18...
WHICH QUADRANT HAS THE HIGHEST TOTAL? 3 
23 32 71 72 58 87 11 77 70 16 
17 21 56 44 68 51 84 20 60 40 
37 8 107 14 12 41 6...
WHY VISUALISE?
100 YEARS OF INDIA’S WEATHER 
1901 
1911 
1921 
1931 
1941 
1951 
1961 
1971 
1981 
1991 
2001 
Jan Feb Mar Apr May Jun Ju...
Most discussions of decision-making 
assume that only senior executives 
make decisions or that only senior 
executives’ d...
THERE ARE MANY WAYS TO AID DATA CONSUMPTION 
SHOW 
me what is happening 
with the data 
Allow me to 
EXPLORE 
and figure i...
SHOW 
me what is happening 
with the data 
Allow me to 
EXPLORE 
and figure it out 
EXPLAIN 
to me why it’s 
happening 
Ju...
EDUCATION 
PREDICTING MARKS 
What determines a child’s marks? 
Do girls score better than boys? 
Does the choice of subjec...
TN CLASS X: ENGLISH 
40,000 
35,000 
30,000 
25,000 
20,000 
15,000 
10,000 
5,000 
0 
0 5 10 15 20 25 30 35 40 45 50 55 6...
TN CLASS X: SOCIAL SCIENCE 
40,000 
35,000 
30,000 
25,000 
20,000 
15,000 
10,000 
5,000 
0 
0 5 10 15 20 25 30 35 40 45 ...
TN CLASS X: MATHEMATICS 
40,000 
35,000 
30,000 
25,000 
20,000 
15,000 
10,000 
5,000 
0 
0 5 10 15 20 25 30 35 40 45 50 ...
ICSE 2013 CLASS XII: TOTAL MARKS
CBSE 2013 CLASS XII: ENGLISH MARKS
DETECTING FRAUD 
“ We know meter readings are 
incorrect, for various reasons. 
We don’t, however, have the 
concrete proo...
This plot shows the frequency of all meter readings from 
Apr-2010 to Mar-2011. An unusually large number of 
readings are...
PARLIAMENT DECISIONS 
UPA's best cabinet performance was last 
Friday, with a record 23 decisions taken in a 
single day, ...
RESTAURANT FOUND AN UNUSUAL DIP IN SALES 
A restaurant chain had data for every 
single transaction made over a few 
years...
BANK FOUND ALL LOANS BEFORE 20TH POOR 
Every loan disbursed after the 20th of the month, i.e. from the 21st to 
the end of...
-50% returns +50% 
Profits Made: Over the last 6 
years, you would have beaten a 10% 
Inflation about 82% of the time and ...
The Shawshank 
Redepmption 
The Godfather 
The Dark Knight 
Titanic 
The Phantom 
Menace 
Twilight 
New Moon 
Wild Wild We...
< 50 
< 75 
< 95 
< 100 
= 100 
MLA attendance at the Assembly 
Karnataka, 2008-2012
SHOW 
me what is happening 
with the data 
Allow me to 
EXPLORE 
and figure it out 
EXPLAIN 
to me why it’s 
happening 
Ju...
SHOW 
me what is happening 
with the data 
Allow me to 
EXPLORE 
and figure it out 
EXPLAIN 
to me why it’s 
happening 
Ju...
PERFORMANCE: GIRLS VS BOYS 
Subject Girs higher by Girls Boys 
Physics 0 119 119 
Chemistry 1 123 122 
English 4 130 126 
...
Jain 
Shweta 
Harini 
Sneha Pooja 
Ashwin 
Shah 
Deepti 
Sanjana 
Varshini 
Ezhumalai 
Venkatesan 
Silambarasan 
Pandiyan ...
Based on the results of the 20 lakh 
students taking the Class XII exams 
at Tamil Nadu over the last 3 years, 
it appears...
1% 
2% 
4% 
6% 
9% 
11% 
14% 
11% 
16% 
18% 
22% 22% 
33% 
40% 
30% 
20% 
10% 
0% 
25-30 30-35 35-40 40-45 45-50 50-55 55-...
2% 
4% 
6% 
9% 
12% 
15% 
17% 
15% 
16% 
18% 18% 
20% 
27% 
30% 
20% 
10% 
0% 
25-30 30-35 35-40 40-45 45-50 50-55 55-60 6...
More contestants did not reduce the winner margin 
Karnataka, Assembly Elections 2008 
60% 
50% 
40% 
30% 
20% 
10% 
0% 
0...
More contestants did reduce the runner-up margin 
Karnataka, Assembly Elections 2004 
60% 
50% 
40% 
30% 
20% 
10% 
0% 
0 ...
VISUALISING THE MAHABHARATA 
How does Mahabharata, one of the largest epics 
with 1.8 million words lend itself to text an...
What topics did parties focus on during questions? 
Karnataka, 2008-2012 
Hous 
ing 
Adult 
Educat 
ion 
P.W.D. 
Adminisr ...
What topics did the young & old focus on during questions? 
Karnataka, 2008-2012 
Young Old 
Social 
Welfar 
P.W.D. 
e 
He...
PRE-2009 2009 AND AFTER 
promotion scheme 
revised 
project 
approved 
development 
agreement amendment 
establishment 
ce...
SHOW 
me what is happening 
with the data 
Allow me to 
EXPLORE 
and figure it out 
EXPLAIN 
to me why it’s 
happening 
Ju...
SHOW 
me what is happening 
with the data 
Allow me to 
EXPLORE 
and figure it out 
EXPLAIN 
to me why it’s 
happening 
Ju...
Bangalore 
Singapore 
1 follower 
100 followers 
A follows B (or) 
B follows A 
Most followed in 
Bangalore 
Sudar, Yahoo!...
DIRECTORSHIPS AT THE TATAS 
Every person who was a Director at the Tata 
Group is shown here as an orange circle. The size...
SHOW 
me what is happening 
with the data 
Allow me to 
EXPLORE 
and figure it out 
EXPLAIN 
to me why it’s 
happening 
Ju...
VISUALISATION IS IMPERATIVE FOR 
DATA → INSIGHTS → ACTION 
Spot the unusual Communicate patterns Simplify decisions
A data analytics and visualisation company 
 gramener.com 
for more examples 
We handle terabyte-size data via non-tradit...
Making Big Data relevant: Importance of Data Visualization and Analytics
Making Big Data relevant: Importance of Data Visualization and Analytics
Making Big Data relevant: Importance of Data Visualization and Analytics
Making Big Data relevant: Importance of Data Visualization and Analytics
Making Big Data relevant: Importance of Data Visualization and Analytics
Making Big Data relevant: Importance of Data Visualization and Analytics
Making Big Data relevant: Importance of Data Visualization and Analytics
Making Big Data relevant: Importance of Data Visualization and Analytics
Making Big Data relevant: Importance of Data Visualization and Analytics
Making Big Data relevant: Importance of Data Visualization and Analytics
Making Big Data relevant: Importance of Data Visualization and Analytics
Upcoming SlideShare
Loading in …5
×

Making Big Data relevant: Importance of Data Visualization and Analytics

A talk at Tech for Citizen Engagement, 2014, Delhi.

http://www.tech4ce.org/agenda/

  • Login to see the comments

Making Big Data relevant: Importance of Data Visualization and Analytics

  1. 1. MAKING BIG DATA RELEVANT THE IMPORTANCE OF DATA VISUALIZATION & ANALYTICS @sanand0 S Anand, Chief Data Scientist, Gramener
  2. 2. A DATA VISUALISATION CHALLENGE… You will see 3 questions. You have 30 seconds. Try it! Your timer starts now
  3. 3. HOW MANY NUMBERS ARE ABOVE 100? 1 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79
  4. 4. HOW MANY NUMBERS ARE BELOW 10? 2 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79
  5. 5. WHICH QUADRANT HAS THE HIGHEST TOTAL? 3 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79
  6. 6. A DATA VISUALISATION CHALLENGE… We’ll answer the same questions again. But with simple visual cues. See how long it takes. Your timer starts now
  7. 7. HOW MANY NUMBERS ARE ABOVE 100? 1 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79
  8. 8. HOW MANY NUMBERS ARE BELOW 10? 2 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79
  9. 9. WHICH QUADRANT HAS THE HIGHEST TOTAL? 3 23 32 71 72 58 87 11 77 70 16 17 21 56 44 68 51 84 20 60 40 37 8 107 14 12 41 69 14 18 71 62 55 59 64 33 55 71 58 103 92 101 56 45 34 43 15 73 78 6 93 39 53 22 26 26 94 60 82 99 74 11 12 36 67 70 71 97 59 73 99 75 74 69 69 51 48 2 66 92 98 15 10 41 58 104 94 92 84 74 82 12 52 10 57 33 77 88 81 81 91 15 56 25 30 21 7 66 66 78 87 29 23 5 34 11 96 74 99 99 88 37 10 43 15 50 71 65 60 101 98 46 34 19 102 57 70 95 84 63 91 3 34 39 37 60 81 65 63 9 71 48 46 25 50 22 64 91 76 71 79
  10. 10. WHY VISUALISE?
  11. 11. 100 YEARS OF INDIA’S WEATHER 1901 1911 1921 1931 1941 1951 1961 1971 1981 1991 2001 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
  12. 12. Most discussions of decision-making assume that only senior executives make decisions or that only senior executives’ decisions matter. This is a dangerous mistake… Peter F Drucker Data generation and analysis are not sufficient. Consuming it as a team and acting in cohesion is.
  13. 13. THERE ARE MANY WAYS TO AID DATA CONSUMPTION SHOW me what is happening with the data Allow me to EXPLORE and figure it out EXPLAIN to me why it’s happening Just EXPOSE the data to me Low effort High effort High effort Low effort Creator Consumer
  14. 14. SHOW me what is happening with the data Allow me to EXPLORE and figure it out EXPLAIN to me why it’s happening Just EXPOSE the data to me
  15. 15. EDUCATION PREDICTING MARKS What determines a child’s marks? Do girls score better than boys? Does the choice of subject matter? Does the medium of instruction matter? Does community or religion matter? Does their birthday matter? Does the first letter of their name matter?
  16. 16. TN CLASS X: ENGLISH 40,000 35,000 30,000 25,000 20,000 15,000 10,000 5,000 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
  17. 17. TN CLASS X: SOCIAL SCIENCE 40,000 35,000 30,000 25,000 20,000 15,000 10,000 5,000 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
  18. 18. TN CLASS X: MATHEMATICS 40,000 35,000 30,000 25,000 20,000 15,000 10,000 5,000 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
  19. 19. ICSE 2013 CLASS XII: TOTAL MARKS
  20. 20. CBSE 2013 CLASS XII: ENGLISH MARKS
  21. 21. DETECTING FRAUD “ We know meter readings are incorrect, for various reasons. We don’t, however, have the concrete proof we need to start the process of meter reading automation. Part of our problem is the volume of data that needs to be analysed. The other is the inexperience in tools or analyses to identify such patterns. ENERGY UTILITY
  22. 22. This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large number of readings are aligned with the tariff slab boundaries. Why would these happen? This clearly shows collusion of some form with the customers. Apr-10 May-10Jun-10Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11 217 219 200 200 200 200 200 200 200 350 200 200 250 200 200 200 201 200 200 200 250 200 200 150 250 150 150 200 200 200 200 200 200 200 200 150 150 200 200 200 200 200 200 200 200 200 200 50 200 200 200 150 180 150 50 100 50 70 100 100 100 100 100 100 100 100 100 100 100 100 110 100 100 150 123 123 50 100 50 100 100 100 100 100 0 111 100 100 100 100 100 100 100 100 50 50 0 100 27 100 50 100 100 100 100 100 70 100 1 1 1 100 99 50 100 100 100 100 100 100 This happens with specific customers, not randomly. Here are such customers’ meter readings. If we define the “extent of fraud” as the percentage excess of the 100 unit meter reading, the value varies considerably across sections, and time Section Apr-10 May-10Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11 Section 1 70% 97% 136% 65% 110% 116% 121% 107% 114% 88% 74% 109% Section 2 66% 92% New 66% section 87% 70% 64% … and is 63% 50% 58% 38% 41% 54% Section 3 90% 46% manager 47% arrives 43% 28% transferred 31% 50% out 32% 19% 38% 8% 34% Section 4 44% 24% 36% 39% 21% 18% 24% 49% 56% 44% 31% 14% Section 5 4% 63% -27% 20% 41% 82% 26% 34% 43% 2% 37% 15% Section 6 18% 23% 30% 21% 28% 33% 39% 41% 39% 18% 0% 33% Section 7 36% 51% 33% 33% 27% 35% 10% 39% 12% 5% 15% 14% Section 8 22% 21% 28% 12% 24% 27% 10% 31% 13% 11% 22% 17% Section 9 19% 35% 14% 9% 16% 32% 37% 12% 9% 5% -3% 11% … with some explainable anomalies.
  23. 23. PARLIAMENT DECISIONS UPA's best cabinet performance was last Friday, with a record 23 decisions taken in a single day, including some long pending key reform measures. The only other such times were Feb 23, 2008 (28 decisions) & Dec 26, 2008 (23 decisions). Nearly two-thirds of decisions are taken on Thursday sessions, which is also visible on the calendar alongside. * CCEA: Cabinet Committee on Economic Affairs ** CCI: Cabinet Committee on Infrastructure Mon 63 5% Tue 56 4% Wed 105 8% Thu 854 65% Fri 223 17% Sat 6 0%
  24. 24. RESTAURANT FOUND AN UNUSUAL DIP IN SALES A restaurant chain had data for every single transaction made over a few years. Plotting this as a time series showed them nothing unusual. However, the same data on a calendar map reveals a very different story. Specifically, at the bottom left point-of-sale terminal, sales dips on every Wednesday. At the bottom right point-of-sale terminal, sales rises on every Wednesday (almost as if to compensate for the loss.) It turns out that the manager closes the bottom-left counter every Wednesday afternoon due to shortage of staff, assuming that it results in no loss of sales. There is, however, a net loss every Wednesday.
  25. 25. BANK FOUND ALL LOANS BEFORE 20TH POOR Every loan disbursed after the 20th of the month, i.e. from the 21st to the end of the month, shows consistently lower non-performing assets (i.e. better quality) than any loan disbursed prior to the 20th. The bank mapped this back to their incentive scheme. The sales team’s commission is based only on loans disbursed until the 20th. Hence new loans are squeezed into this period without regard for their quality. The personal finance division of a bank, focusing on retail loans, drove its sales through a branch sales team. A study of the non-performing assets of loans generated over the course of one year shows a strange pattern. This representation, known as a calendar map, can show some interesting patterns, particularly weekday-based patterns, as the next example will show. Analytics can detect something that you’re specifically looking for. It takes a visual to detect what we don’t know to look for
  26. 26. -50% returns +50% Profits Made: Over the last 6 years, you would have beaten a 10% Inflation about 82% of the time and lost out about 18% of the time. So, mostly, you would have made money on Cipla with an average return of 14.9%. Highest Returns: An average return of 14.1% has been observed when held for a period of one year. with a maximum of 79.6% if sold in Dec 2009, after being held for a year. And a maximum of 486.9% if sold at the end of Nov 2007 after holding for a month. The highest stock price was Rs 414 in Nov/Dec 2012. WHEN TO INVEST This visual shows the returns from buying Cipla’s stock on any given month, and selling it in another. The colour of each cell is the return (red is low, green is high) if you had invested in the stock in a given month and sold it on another. For example this mild red is the slightly negative return if you had bought Cipla stock in Mar 2011 (the row) and sold it in Jun 2011 (the column).
  27. 27. The Shawshank Redepmption The Godfather The Dark Knight Titanic The Phantom Menace Twilight New Moon Wild Wild West Transformers The Good, The Bad, The Ugly 12 Angry Men 7 Samurai Rang De Basanti Taare Zameen Par Yojinbo MORE VOTES BETTER RATED Many unwatched movies Few unwatched movies Mix of watched & unwatched Few watched movies Many watched movies Movies on the IMDb 3 Idiots https://gramener.com/imdb/
  28. 28. < 50 < 75 < 95 < 100 = 100 MLA attendance at the Assembly Karnataka, 2008-2012
  29. 29. SHOW me what is happening with the data Allow me to EXPLORE and figure it out EXPLAIN to me why it’s happening Just EXPOSE the data to me … to inform and to entertain
  30. 30. SHOW me what is happening with the data Allow me to EXPLORE and figure it out EXPLAIN to me why it’s happening Just EXPOSE the data to me
  31. 31. PERFORMANCE: GIRLS VS BOYS Subject Girs higher by Girls Boys Physics 0 119 119 Chemistry 1 123 122 English 4 130 126 Computers 6 137 131 Biology 6 129 123 Mathematics 11 123 112 Language 11 152 141 Accounting 12 138 126 Commerce 13 127 114 Economics 16 142 126
  32. 32. Jain Shweta Harini Sneha Pooja Ashwin Shah Deepti Sanjana Varshini Ezhumalai Venkatesan Silambarasan Pandiyan Kumaresan Manikandan Thirupathi Agarwal Kumar Priya
  33. 33. Based on the results of the 20 lakh students taking the Class XII exams at Tamil Nadu over the last 3 years, it appears that the month you were born in can make a difference of as much as 120 marks out of 1,200. … and peaks for Sep-borns 120 marks out of 1200 explainable by month of birth June borns The marks shoot up for Aug borns score the lowest An identical pattern was observed in 2009 and 2010… … and across districts, gender, subjects, and class X & XII. “It’s simply that in Canada the eligibility cutoff for age-class hockey is January 1. A boy who turns ten on January 2, then, could be playing alongside someone who doesn’t turn ten until the end of the year— and at that age, in preadolescence, a twelve-month gap in age represents an enormous difference in physical maturity.” -- Malcolm Gladwell, Outliers
  34. 34. 1% 2% 4% 6% 9% 11% 14% 11% 16% 18% 22% 22% 33% 40% 30% 20% 10% 0% 25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90 0 500 1000 1500 2000 2500 Win % The number of winning candidates as a % of candidates in the age group Candidates The number of candidates in each age group Lok Sabha (2004 onwards)
  35. 35. 2% 4% 6% 9% 12% 15% 17% 15% 16% 18% 18% 20% 27% 30% 20% 10% 0% 25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90 0 2000 4000 6000 8000 10000 12000 14000 Win % The number of winning candidates as a % of candidates in the age group Candidates The number of candidates in each age group Assembly elections (2004 onwards)
  36. 36. More contestants did not reduce the winner margin Karnataka, Assembly Elections 2008 60% 50% 40% 30% 20% 10% 0% 0 2 4 6 8 10 12 14 16 18 # contestants Winner margin
  37. 37. More contestants did reduce the runner-up margin Karnataka, Assembly Elections 2004 60% 50% 40% 30% 20% 10% 0% 0 2 4 6 8 10 12 14 16 18 # contestants Runner-up margin
  38. 38. VISUALISING THE MAHABHARATA How does Mahabharata, one of the largest epics with 1.8 million words lend itself to text analytics? Can this ‘unstructured data’ be processed to extract analytical insights? What does sentiment analysis of this tome convey? Is there a better way to explore relations between characters? How can closeness of characters be analysed & visualized?
  39. 39. What topics did parties focus on during questions? Karnataka, 2008-2012 Hous ing Adult Educat ion P.W.D. Adminisr ative Reforms Minor Irrigati on Small Indust ries Social Welfar Agric ultura l Mark eting Agricul Animal ture Husban dry Coope rative Excis e Fina nce Fishe ries Fishe ries & Inlan d wate r trans port Food & Civil Supplies Fore st Fuel Haz & Wakf Health and family welfare Higher Educati on Hom e Horticu lture Info rma tion & Tec hno logy Kannad a & Culture Labo ur Law & Hu man Righ ts Major & Medium Industri es Medical Educatio n Medium and Large Industrie s Mines & Geolo gy Muz rai Parlia mentar y Affairs and Human Rights Plan ning Planni ng and Statist ics Primary and Secondary Education Primary Educati on Pris on Pub lic Libr ary Reve nue Rural Developme nt and Panchayat Raj Rural Wate r Suppl y Rural Water Supply and Sanitat ion Seri cult ure Smal l Scale Indu strie s e Suga r Textil e Touri sm Tran sport Transp ortatio n Urban Develo pment Water Resourc es Woman & Child Developm ent Youth and Sports Yout h Servi ce & Spor ts BJP focus JD(S) focus INC focus
  40. 40. What topics did the young & old focus on during questions? Karnataka, 2008-2012 Young Old Social Welfar P.W.D. e Health and family welfare Reven ue Rural Developme nt and Panchayat Raj Animal Husba ndry Rural Water Supply and Sanitati Planni ng and Statisti cs Suga r Urban Develo pment Water Resour ces Minor Irrigati on Fuel Parliam entary Affairs and Human Rights Hous ing Agric ulture Primary Educati on Primary and Secondary Education Woman & Child Priso n Developme nt Higher Educati on Hom Coope e rative Fore st Adminisra tive Reforms Labo ur Food & Civil Supplies Tour ism Fina nce Transpo rtation Hortic ulture Muzr ai Haz & Wakf Trans Medical port Educatio n Medium and Large Industries Excis e Major & Medium Industrie s Kannad a & Culture Text ile Fishe ries Adult Educati on on Mines & Geolog y Small Industr ies Youth and Sports Agricul tural Marke ting Rural Water Supply Fisher ies & Inland water trans port Small Scale Indus tries Yout h Servi ce & Sport s Seric ultur e Law & Hum an Righ ts Plan ning Info rma tion & Tec hnol ogy Publ ic Libr ary
  41. 41. PRE-2009 2009 AND AFTER promotion scheme revised project approved development agreement amendment establishment central act Decisions to increase the number of lanes on highways grew significantly post-2009, especially as part of the CCI (Cabinet Committee on Infrastructure) decisions section limited bill laning plan government new ltd approval phase sector state setting investment pradesh policy four year programme amendments fund indian extension institute commission nhdp technology proposal iii implementation equity assistance cooperation transfer infrastructure additional corporation international mou cabinet company public construction services continuation approves education states financial revision sponsored port mission centrally basis signing protection management capital bank two projects research upgradation rural special land delhi employees existing committee relief convention six crore payment power health cost package institutions acquisition control restructuring air grant field university scheduled Decisions related to intervention, assistance and relief were almost entirely concentrated in pre-2009 The number of international agreements has declined dramatically between pre-2009 and post-2009 A significant rise in the number of decisions related to the States is seen post 2009 – in contrast with the focus on “Central” pre-2009 PARLIAMENT DECISIONS
  42. 42. SHOW me what is happening with the data Allow me to EXPLORE and figure it out EXPLAIN to me why it’s happening Just EXPOSE the data to me … to connect the dots for your readers
  43. 43. SHOW me what is happening with the data Allow me to EXPLORE and figure it out EXPLAIN to me why it’s happening Just EXPOSE the data to me
  44. 44. Bangalore Singapore 1 follower 100 followers A follows B (or) B follows A Most followed in Bangalore Sudar, Yahoo! Anand C, Consultant Kiran, Hasgeek Anand S, Gramener Most followed in Singapore Mugunth, Steinlogic Honcheng, buUuk Sau Sheong, HP Labs Lim Chee Aung SOCIAL MEDIA IN AUTOMATED RECRUITING
  45. 45. DIRECTORSHIPS AT THE TATAS Every person who was a Director at the Tata Group is shown here as an orange circle. The size of the circle is based on the number of directorship positions held over their lifetime. Every company in the Tata Group is shown here as a blue circle. The size of the circle is based on the number of directors the company has had over time. Every directorship relation is shown by a line. If a person has held a directorship position at a company, the two are connected by a line. The group appears to be divided into two clusters based on the network of directorship roles. Prominent leaders bridge the groups Tata Teleservices Tata Consultancy Services Similar network patterns have helped our clients: • locate terrorists (who called each other but no one outside their network) • de-duplicate customers (who share the same address and date of birth) • analyse competitor strengths (based on the cluster of keywords in their patents) Tata Business Support Services Tata Global Beverages Tata Infotech (merged) Tata Toyo Radiator Honeywell Automation India Tata Communications A G C Networks Tata Technologies Some directors are mainly associated with the first group of companies Tata Projects Tata Power Tata Finance Idea Cellular Tata Motors Tata Sons Tata Steel Tayo Rolls Tata Securities Tata Coffee Tata Investment Corp A J Engineer H H Malgham H K Sethna Keshub Mahindra Ravi Kant Russi Mody Sujit Gupta A S Bam Amal Ganguli D B Engineer D N Ghosh M N Bhagwat N N Kampani U M Rao B Muthuraman Ishaat Hussain J J Irani N A Palkhivala N A Soonawala R Gopalakrishnan Ratan Tata S Ramadorai S Ramakrishnan Second group of companies First group of companies Some directors are mainly associated with the second group of companies
  46. 46. SHOW me what is happening with the data Allow me to EXPLORE and figure it out EXPLAIN to me why it’s happening Just EXPOSE the data to me … to allow your users to tell stories
  47. 47. VISUALISATION IS IMPERATIVE FOR DATA → INSIGHTS → ACTION Spot the unusual Communicate patterns Simplify decisions
  48. 48. A data analytics and visualisation company  gramener.com for more examples We handle terabyte-size data via non-traditional analytics and visualise it in real-time.

×