AWS Summit 2014 Brisbane - Breakout 5
Most organisations are facing ever growing volumes of data that need to be stored and processed but most importantly analysed to bring value to the business. Big Data appears to have solutions to address these challenges but the landscape is littered with acronyms and obscure naming conventions such as MPP, NoSQL, Hadoop, Hive and HBase. Attend this Session to find out
- What is the value proposition for each of these technologies
- How do they fit with more traditional Big Data solutions such as data warehouses?
- How AWS can help organisations get maximum value from their data
Presenter: Russell Nash, Solutions Architect, APAC, Amazon Web Services
10. Big Data Verticals and Use cases
Media/
Advertising
Targeted
Advertising
Image and
Video
Processing
Oil & Gas
Seismic
Analysis
Retail
Recommendations
Transactions
Analysis
Life
Sciences
Genome
Analysis
Financial
Services
Monte Carlo
Simulations
Risk
Analysis
Security
Anti-virus
Fraud
Detection
Image
Recognition
Social
Network/
Gaming
User
Demographics
Usage
analysis
In-game
metrics
14. 400 GB of logs per day
~12 Terabytes per month
15.
16. Amazon S3
1) Load log file data for six
months of user search history
into Amazon S3
Search ID Search Text Final Selection
12423451 westen Westin
14235235 wisten Westin
54332232 westenn Westin
12423451
14235235
54332232
12423451
14235235
54332232
12423451
14235235
54332232
12423451
14235235
54332232
12423451
17. Amazon S3 Amazon EMR
Log Files
2) Spin up a 200 node cluster
Hadoop Cluster
18. Amazon S3 Amazon EMR
3) 200 nodes simultaneously analyze
this data looking for common
misspellings
… this takes a few hours
Hadoop Cluster
19. Amazon S3 Amazon EMR
4) New common misspellings and
suggestions loaded back into S3
Hadoop Cluster
Log Files
20. Amazon S3 Amazon EMR
5) When the job is done, the
cluster is shut down.
Log Files
31. ID Age State
123 20 CA
345 25 WA
678 40 FL
Relational Table
ID Attributes
123 Age:20, State:CA
345 Age:25, Country: Australia, Gender: F, Smoker: No
678 Age:40
Non-Relational Table
37. Data
Sources
App.4
[Machine
Learning]
AWS
Endpoint
App.1
[Aggregate
&
De-‐Duplicate]
Data
Sources
Data
Sources
Data
Sources
App.2
[Metric
ExtracIon]
S3
DynamoDB
Redshift
App.3
[Sliding
Window
Analysis]
Data
Sources
Availability
Zone
Shard 1
Shard 2
Shard N
Availability
Zone
Availability
Zone
Amazon Kinesis
EMR
39. Amazon Mobile Analytics
Fast: get your data within an hour
Automatic MAU, DAU, session and
retention reports
Design and track custom app events
Data is not mined or sold by Amazon
40. Expand your skills with AWS
Certification
aws.amazon.com/certification
Exams
Validate your proven
technical expertise with
the AWS platform
On-Demand
Resources
aws.amazon.com/training/
self-paced-labs
Videos & Labs
Get hands-on practice
working with AWS
technologies in a live
environment
aws.amazon.com/training
Instructor-Led
Courses
Training Classes
Expand your technical
expertise to design, deploy,
and operate scalable,
efficient applications on AWS