7. Many definitions
Very large volume Three V’s: Social
with low density Velocity interactions and
of information Variety web activity data
Volume
Amongst many others..
8. Storage Big Data Compute
Unconstrained data growth
95% of the 1.2 zettabytes of
ZB data in the digital universe is
unstructured
70% of of this is user-
EB generated content
Unstructured data growth
explosive, with estimates of
PB compound annual growth
(CAGR) at 62% from 2008 –
GB TB 2012.
Source: IDC
9. Storage Big Data Compute
Where does it come from?
Web sites Sensor data
Blogs/Reviews/Emails/Pictures Weather, water, smart grids
Social Graphs Images/videos
Facebook, Linked-in, Contacts Traffic, security cameras
Application server logs Twitter
Web sites, games 50m tweets/day 1,400% growth per
year
11. Storage Big Data Compute
Why now?
Web sites Sensor data
Blogs/Reviews/Emails/Pictures Weather, water, smart grids
Social Graphs Images/videos
Facebook, Linked-in, Contacts Traffic, security cameras
Application server logs Twitter
Web sites, games 50m tweets/day 1,400% growth per
year
12. Storage Big Data Compute
Why now?
Web sites Sensor data
Weather, water, smart grids
Mobile connected world
Blogs/Reviews/Emails/Pictures
Social Graphs Images/videos
Facebook, Linked-in, Contacts Traffic, security cameras
(more people using, easier to collect)
Application server logs Twitter
Web sites, games 50m tweets/day 1,400% growth per
year
13. Storage Big Data Compute
Why now?
Web sites Sensor data
Weather, water, smart grids
More aspects of data
Blogs/Reviews/Emails/Pictures
Social Graphs Images/videos
Facebook, Linked-in, Contacts Traffic, security cameras
(variety, depth, location, frequency)
Application server logs Twitter
Web sites, games 50m tweets/day 1,400% growth per
year
14. Storage Big Data Compute
Why now?
Web sites Sensor data
Weather, water, smart grids
Possible to understand
Blogs/Reviews/Emails/Pictures
Social Graphs Images/videos
Facebook, Linked-in, Contacts Traffic, security cameras
(not just answer specific questions)
Application server logs Twitter
Web sites, games 50m tweets/day 1,400% growth per
year
23. Lorem ipsum dolor sitStorage Big Data Compute
met, consectetur Bring compute capacity to the data
dipiscing elit. Etiam
Lorem ipsum dolor
uis ligula neque, eget
amet, consecte
enenatis sem. Personal
adipiscing elit. Etia
Suspendisse non eros
quis ligula neque, eg
ulla, at placerat nibh.
Cras id lectus mattis est
Very large dataset venenatis se
Suspendisse non er
llamcorper blandit.seeks strong & nulla, at placerat nibh
Proin ut nisi vitae enim
ulputate tempor. consistent compute for Cras id lectus mattis
Phasellus id commodo est ullamcorper
ros. Mauris necshort term relationship, blandit. Proin ut nisi
ignissim turpis. Nunc vitae enim vulputate
possibly longer. GSOH a tempor. Phasellus id
Cras id lectus mattis plus aws.amazon.com commodo eros.
Mauris nec dignissim
est ullamcorper
turpis. Nunc
45. Lots of actions
by John Smith
Very large
click log
(e.g TBs)
46. Lots of actions
by John Smith
Very large
click log
(e.g TBs) Split the
log into
many small
pieces
47. Process in an
Lots of actions EMR cluster
by John Smith
Very large
click log
(e.g TBs) Split the
log into
many small
pieces
48. Process in an
Lots of actions EMR cluster
by John Smith
Very large
click log
(e.g TBs) Split the Aggregate
log into the results
many small from all
pieces the nodes
49. Process in an
Lots of actions EMR cluster
by John Smith
Very large What
click log John
(e.g TBs) Split the Aggregate
log into the results
Smith
many small from all did
pieces the nodes
50. Very large What
click log John
(e.g TBs) Insight in a fraction of the time Smith
did
55. Features powered by Amazon Elastic
MapReduce:
People Who Viewed this Also Viewed
Review highlights
Auto complete as you type on search
Search spelling suggestions
Top searches
Ads
200 Elastic MapReduce jobs per day
Processing 3TB of data
58. Data Analytics
3.5 billion records Execute batch processing data sets
ranging in size from dozens of “Our first client
71 million unique cookies Gigabytes to Terabytes campaign experienced
1.7 million targeted ads a 500% increase in
Building in-house infrastructure to
required per day analyze these click stream datasets their return on ad
requires investment in expensive spend from a similar
“headroom” to handle peak demand. campaign a year
before”
User recently
purchased a Targeted Ad
sports movie
and is searching (1.7 Million per day)
for video games