11. DATA VOLUME
Generated data
Available for analysis
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
13. + ELASTIC AND HIGHLY SCALABLE
+ NO UPFRONT CAPITAL EXPENSE
+ PAY FOR ONLY WHAT YOU USE
+ AVAILABLE ON-DEMAND
= REMOVE
CONSTRAINTS
14. AWS Import / Export
AWS Direct Connect
GENERATE STORE ANALYZE SHARE
15. Generated and stored in AWS
Inbound data transfer is free
Multipart upload to S3
Physical media
AWS Direct Connect
Regional replication of AMIs and snapshots
16. Amazon S3,
Amazon Glacier,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
AWS Storage Gateway,
Data on Amazon EC2
GENERATE STORE ANALYZE SHARE
32. AMAZON REDSHIFT LETS YOU
START SMALL AND GROW BIG
Extra Large Node (HS1.XL)
Eight Extra Large Node (HS1.8XL)
Cluster 2-100 Nodes (32 TB – 1.6 PB)
Single Node (2 TB)
Cluster 2-32 Nodes (4 TB – 64 TB)
42. Price Per Hour for
HS1.XL Single
Node
Effective Hourly
Price Per TB
Effective Annual
Price per TB
On-Demand
$ 0.850
$ 0.425
$ 3,723
1 Year
Reservation
$ 0.500
$ 0.250
$ 2,190
3 Year
Reservation
$ 0.228
$ 0.114
$
999
43. DATA WAREHOUSING DONE THE AWS WAY
Easy to provision and scale up massively
No upfront costs, pay as you go
Really fast performance at a really low price
Open and flexible with support for popular tools
45. Cloud ETL for Big Data
S3
EMR
Redshift
Reporting
and BI
• Maintain online SQL access to historical logs
• Transformation and enrichment with EMR
• Longer history ensures better insight
46. Live archive for (structured) Big Data
OLTP
Web Apps
•
•
•
•
DynamoDB
Redshift
Direct integration with copy command
High velocity data
Data ages into Redshift
Low cost, high scale option for new apps
Reporting
and BI
69. AWS Data Pipeline
Data-intensive orchestration and automation
Reliable and scheduled
Easy to use, drag and drop
Execution and retry logic
Map data dependencies
Create and manage compute resources
70.
71.
72.
73. AWS Import / Export
AWS Direct Connect
Amazon S3,
Amazon Glacier,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
AWS Storage Gateway,
Data on Amazon EC2
Amazon S3,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
Data on Amazon EC2
GENERATE STORE ANALYZE SHARE
Amazon EC2
Amazon Elastic
MapReduce
AWS Data Pipeline
77. MXM FACTS
7+ million lyrics catalogue in more than 50
distinct languages
Currently musiXmatch is the only lyrics
platform allowed for worldwide licensing and
has deals with top Music Publishers: Warner
Chappell, Universal, BMG, EMI Publishing,
Sony ATV, Peer Music, ...
Daily updated with more than 1 million artists
and more than 20 million music tracks
Synced lyrics!
Music Discography Meta Data: Lyrics, Artists,
Albums, Songs, Biographies, Worldwide
Charts
Words matter
84. BATCH REPORTING
Step 1. Aggregation of views by country,
application and content type
Step 2. Join with a 500M+ rows table
Hive
It takes approx 1 hour with 5 c1.xlarge
instances
It used to take days with traditional techniques!
Post
process
Batch
SQL interface makes it easier to review and
share the process
Words matter
86. INTERACTIVE ANALYTICS
SQL interface like Hive, accessible with any
Postgresql client...
Redis
(real time
analytics)
Redshift
...but faster!
Flexible costs
Analytics
With Redshift doing all the heavy lifting, it's
easier to build analytics tools
Interactive
Words matter