2. Big Data
Big Data Players in the Market
Hadoop Ecosystems
Analytics
Machine Learning Algorithms
SMAC
3.
4. WHAT IS BIG DATA?
“Big Data” is high-volume, high velocity,
high variety information assets that demand cost effective,
innovative forms of information processing for enhanced
insight and decision making.
5. By 2020, 1.7 MB of new information will be created for each and every human being on the planet – every second every day.
8. HADOOP WAS A KEY PART OF IBMS WATSON
Hadoop analytics and data
discovery abilities were a
big reason that IBM's
Watson computer was able
to win a widely publicized
"Jeopardy“ showdown last
year against a couple of
very successful human
former champions.
15. AN INSURANCE PROBLEM
Product
Car Insurance
Life Insurance
Health Insurance
2-wheeler Insurance
Heavy Vehicle Insurance
Revenues in last quarter in
million
110
180
220
90
100
17. FIRST MODEL . . .
Categorize data as VEHICLE and NON-VEHICLE insurance.
The average of vehicle insurance: 100
The unexplained = (90-100)2+(100-100)2+(110-100)2=200
The average non-vehicle insurance = 200
The unexplained = (180-200)2+(220-200)2= 800
R2
18.
19. Analytics
Machine Learning : Supervised & Un-supervised Learning
Lets get started with two different techniques
(Supervised) - Classification and Regression
(Un-Supervised) - Clustering
20. Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter,
handwriting recognition,
most of Natural Language Processing (NLP),
Computer Vision.
- Self-customizing programs
E.g., Amazon, Netflix product recommendations
- Understanding human learning (brain, real AI).
25. "Consumer data will be the biggest
differentiator in the next two to three years.
Whoever unlocks the reams of data and uses
it strategically will win“
-Angela Ahrendts, CEO of Burberry
Big Data is key to any Loyalty scheme
26. The Obama 2012 campaign used data analytics and the experimental method
to assemble a winning coalition vote by vote. The interests of individual voters
were known and addressed.
Online Media and Web Analytics helped Obama beat McCain, changed the
political scene in one of the most powerful nations in the world and how it has
influenced the course of history
- Obama had 2.5 M Facebook friends compared to a paltry 0.5 M Facebook
friends for McCain (seems strange to think of politicians on Facebook though..)
– Obama raised USD 500 M online versus the total amount of USD,
201 M by McCain
Percentage of votes cast for Obama by early voters in Hamilton
Model - 57.68%, Actual 57.16%
Television commercials aired on TV land (National cable level)
Obama campaign - 1,710, Romney campaign - 0
Money spent on online Ads through Mid-October
Romney Campaign - $26 million, Obama Campaign - $52 millions
27.
28.
29.
30. SMAC will be the platform that will enable
organizations to drive consumerization of
technology, including IT. Early adopters of
SMAC stack would have a clear competitive
edge in their line of business
31.
32.
33.
34. cloud computing is a synonym for
distributed computing over a network,
and means the ability to run a program
or application on many connected
computers at the same time