4. THERE ARE MANY WAYS TO AID DATA CONSUMPTION
SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
Low effort High effort
High effort
Low effort
Creator
Consumer
5. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
9. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
Simplifying access to data is a big win
10. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
11. DETECTING FRAUD
“
We know meter readings are
incorrect, for various reasons.
We don’t, however, have the
concrete proof we need to start
the process of meter reading
automation.
Part of our problem is the
volume of data that needs to be
analysed. The other is the
inexperience in tools or
analyses to identify such
patterns.
ENERGY UTILITY
12. AN ENERGY UTILITY DETECTED BILLING FRAUD
This plot shows the frequency of all meter readings from Apr-
2010 to Mar-2011. An unusually large number of readings are
aligned with the slab boundaries.
Below is a simple histogram (or frequency distribution) of usage levels.
Each bar represents the number of customers with a customers with a
specific bill amount (in units, or KWh).
Tariffs are based on the usage slab. Someone with 101 units is billed in
full at a higher tariff than someone with 100 units. So people have a
strong incentive to stay at or within a slab boundary.
An energy utility (with over 50 million
subscribers) had 10 years worth of
customer billing data available.
Most fraud detection software failed to
load the data, and sampled data
revealed little or no insight.
This can happen in one of two ways.
First, people may be monitoring their
usage very carefully, and turn of their
lights and fans the instant their usage
hits the slab boundary.
Or, more realistically, there’s probably some level of corruption
involved, where customers pay a small sum to the meter reading staff
to ensure that it stays exactly at the slab boundary, giving them the
advantage of a lower price.
13. PREDICTING MARKS
“
What determines a child’s marks?
Do girls score better than boys?
Does the choice of subject matter?
Does the medium of instruction
matter?
Does community or religion
matter?
Does their birthday matter?
Does the first letter of their name
matter?
EDUCATION
18. Based on the results of the 20 lakh
students taking the Class XII
exams at Tamil Nadu over the last
3 years, it appears that the month
you were born in can make a
difference of as much as 120
marks out of 1,200.
June borns
score the lowest
The marks shoot
up for Aug borns
… and peaks for
Sep-borns
120 marks out of
1200 explainable
by month of birth
An identical pattern was observed in 2009 and 2010…
… and across districts, gender, subjects, and class X & XII.
“It’s simply that in Canada the eligibility cut-
off for age-class hockey is January 1. A boy
who turns ten on January 2, then, could be
playing alongside someone who doesn’t turn
ten until the end of the year—and at that age,
in preadolescence, a twelve-month gap in age
represents an enormous difference in physical
maturity.”
-- Malcolm Gladwell, Outliers
19. This is a dataset (1975 – 1990) that has
been around for several years, and has
been studied extensively. Yet, a
visualization can reveal patterns that
are neither obvious nor well known.
For example,
• Are birthdays uniformly distributed?
• Do doctors or parents exercise the C-section option to move dates?
• Is there any day of the month that has unusually high or low births?
• Are there any months with relatively high or low births?
Very high births in September.
But this is fairly well known.
Most conceptions happen during
the winter holiday season
Relatively few births during the
Christmas and Thanksgiving
holidays, as well as New Year and
Independence Day.
Most people prefer not
to have children on the
13th of any month, given
that it’s an unlucky day
Some special days like April
Fool’s day are avoided, but
Valentine’s Day is quite
popular
More births Fewer births … on average, for each day of the year (from 1975 to 1990)
LET’S LOOK AT 15 YEARS OF US BIRTH DATA
20. THE PATTERN IN INDIA IS QUITE DIFFERENT
This is a birth date dataset that’s
obtained from school admission data
for over 10 million children. When we
compare this with births in the US, we
see none of the same patterns.
For example,
• Is there an aversion to the 13th or is there a local cultural nuance?
• Are holidays avoided for births?
• Which months have a higher propensity for births, and why?
• Are there any patterns not found in the US data?
Very few children are born in the
month of August, and thereafter.
Most births are concentrated in
the first half of the year
We see a large number of
children born on the 5th, 10th,
15th, 20th and 25th of each month
– that is, round numbered dates
Such round numbered patterns a
typical indication of fraud. Here,
birthdates are brought forward
to aid early school admission
More births Fewer births … on average, for each day of the year (from 2007 to 2013)
21. THIS ADVERSELY IMPACTS CHILDREN’S MARKS
It’s a well established fact that older
children tend to do better at school in
most activities. Since many children
have had their birth dates brought
forward, these younger children suffer.
The average marks of children “born” on the 1st, 5th, 10th, 15th etc. of the
month tend to score lower marks.
• Are holidays avoided for births?
• Which months have a higher propensity for births, and why?
• Are there any patterns not found in the US data?
Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013)
Children “born” on round numbered days score lower marks on average,
due to a higher proportion of younger children
22. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
… to inform and to highlight
23. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
24. IMPACT OF THE BUDGET ON STOCK PRICES
https://gramener.com/budget/?Year=2010
25. “Which is the least successful party
in Indian elections history?”
26. WHICH IS THE LEAST SUCCESSFUL PARTY?
https://gramener.com/election/parliament#story.ddp
27. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
… to connect the dots for your readers
28. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
29. Recruiting top quality developers is always a problem. We decided to use an
algorithmic approach and pulled out the social network of developers on
Github (a social network for open source code).
In this visualisation, each circle is a person. The size of the circle
represents the number of followers. Larger circles have more
followers (but not in proportion – it’s a log scale.)
The circle’s colour represents the city the
programmer’s live in. This visual is a slice showing the
tale of two cities: Bangalore and Singapore
Two people are connected if one
follows the other. This leads to a
clustering of people in the form of a
network.
Here, you can see that Bangalore and
Singapore are reasonably well
connected cities. Bangalore has more
developers, but Singapore has more
popular ones (larger circles).
However, the interaction between
Bangalore and Singapore are few and
far between. But for a few people
across both cities, like:
… etc.
Sudar, Yahoo!
Anand C, Consultant
Kiran, Hasgeek
Anand S, Gramener
Mugunth, Steinlogic
Honcheng, buUuk
Sau Sheong, HP Labs
Lim Chee Aung
Bangalore
Singapore
1 follower
100 followers
A follows B (or)
B follows A
Most followed in
Bangalore
Most followed in
Singapore
Ciju Cherian
Lin Junjie
Amudhi Sebastian
There are, of course, a number of smaller
independent circles – people who are not connected
to others in the same city. (They may be connected to
people in other cities.)
Apart from this, there are a few small networks of
connected people – often people within the same
company or start-up – who form a community of their
own.
THE SOCIAL TALE OF TWO CITIES: BANGALORE & SINGAPORE
https://gramener.com/codersearch/
32. Has there ever been an
all-woman election?
Who’s the oldest
candidate ever?
Who won by the lowest
margins ever in history?
Was there ever an
uncontested win?
Som Marandi (BJP) and Konathala
Ramakrishna (INC) won by just 9
votes in Bihar, 1998 and AP, 1989
respectively.
Since 1989, no election was won
uncontested. Srinagar, J&K was the
last, where Mohammad Shafi Bhat
of JKN won without competition.
Only two elections had women
candidates exclusively: Karur, TN
(1967) and Panskura, WB (1977).
Only 8 had a woman majority ever.
Arif Ahmed Shaikh Jafhar (NBNP)
contested the 2009 elections from
Dhhule, MH at age 99, making him
the oldest candidate ever in India.
35. THERE ARE MANY WAYS TO AID DATA CONSUMPTION
SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
Low effort High effort
High effort
Low effort
Creator
Consumer
36. More examples at
gramener.com
blog.gramener.com
A data analytics and visualisation company
We handle terabyte-size data via non-traditional analytics and visualise in real-time.
Reach me at
ganes.kesari@gramener.com
@kesaritweets
Editor's Notes
Gramener is a data analytics and visualisation company. We handle large-scale data via non-traditional analytics (by which we mean programmatic analysis) and visualize the results in real-time.
The visualizations are our key differentiator.
We transform your data into concise dashboards that make it easy for you to find the problems as well as the solution.
We help you find these insights quickly, based on our work in cognitive research, and our visualizations guide you towards actionable decisions.
In other words, we make enterprise data consumption very easy.