(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in partwithout the express consent of Amazon.com, Inc.
November 13, 2014 | Las Vegas, NV
ARC202Real-World Real-Time Analytics
Gustavo Arjones| @arjones
CTO, Socialmetrix
Sebastian Montini | @sebamontini
Solutions Architect, Socialmetrix

•SaaS Company—since 2008
•Social media analytics track and measure activity of brands and personality, providing information to market research and brand comparison
•Multilanguagetechnology(English, Portuguese, and Spanish)
•Leader in Latin America, with operations in 5 countries, customers in Latin Americaand US
•1 out of 34 Twitter Certified Program worldwide

Ranking
Brand 1
Brand 2
Brand 3
Q2
Q3
Q2
Q3
Q2
Q3
1°
Flavor
Breakfast
Flavor
Flavor
Advertising
Flavor
2°
Healthy
Flavor
Packaging
Brand I love
Flavor
Breakfast
3°
Components
Components
Healthy
Packaging
Healthy
Healthy
4°
Advertising
Healthy
Components
Addiction
Components
Advertising
5°
Enquires
Desire
Prices
Consumption
Prices
Components
TOTAL
1.401
8.189
463
5.519
1.081
2.445
Share of topics
Which conversations are my brand and my competitors’ brands driving?

Challenges: Variety
•Different data sources
•Different API
•SLA
•Method (pull or push)
•Rate-limit, backoff strategy

Challenges: Velocity
•Updates every second
•Top users, top hashtags each minute
•After event analysis are made with batch over complete dataset
•Spikes of 20,000+ tweets per minute

Last TV Debate
Results Announced
Challenges: Velocity

Challenges: Meaning
•Disambiguation
•DataEnrichment
–Demographics
–Sentiment
–Influencers
•Humananalysis
PAN
Orange Telecom
Oi Telecom
Hi!

Challenges: Alert and report
•Clear and understandable UI
•Slice-dice for business (not BI experts)
•Real-time alerts for anomalies

Drivers for architecture evolution
•More customers, bigger customers
•Add new features
•Keep costsunder control

Architecture evolution
0
20
40
60
80
100
120
#1 #2 #3 #4
Active Customers

Architecture—1stiteration
What we needed:
•Complete data isolation
•Trying different solutions/offerings

What we did:
•All-in-one approach
•Multi-instance architecture
•Simple vertical scalability
•MySQL performance tuning

What we've learned:
•Multi-instance is harder to administrate, but minimizes instability impact on customers
•Vertical scalability: poor resource management
•MySQL schema changes translate into downtime

Architecture—2nditeration
What we needed:
•Separation of responsibilities (crawling, processing)
•Horizontal scalability
•Fast provisioning
•Cost reduction

What we changed:
•Migrated to AWS
•RabbitMQ (Single Node)
•Replace MySQL for Amazon RDS
•AWS CloudFormation
•Auto Scaling groups

What we've learned:
•PIOPS 
•Tuning theAuto Scaling policiescan be hard
•AWS CloudFormation: great for migration, not enough for daily ops

Architecture—3rditeration
What we needed:
•Delivernew features (NRT, more complex analytics)
•Scalefast
•Be resilient against failure
•Addingand improvingdata sources
•Keepcosts under control (always)

What we changed:
•Apache Storm
•RabbitMQ HA
•Amazon ElasticMapReduce (Hadoop/Hive)
•AWS CloudFormation + Chef
•Amazon Glacier + Amazon S3 lifecyclespolicies

What we've learned:
•Spot Instances+ ReservedInstances
•Hive= SQL SQL scripts are hard to test
•BulkupsertsonAmazon RDS can be expensive (PIOPS)
•Amazon DynamoDB is great, but expensive (for our use-case)

Architecture—4thiteration
What we needed:
•Monitor millions of social media profiles
•Make data accessible (exploration, PoC)
•Improve UI response times
•Testing our data pipelines
•Reprocessing (faster)

Architecture—4th iteration
What we changed:
• Cassandra (DSE)
• MongoDB MMS
• Apache Spark

What we've learned:
•Leverage AWS ecosystem
•DatastaxAMI + Opscenterintegration
•MongoDBMMS: automation magic!
•Apache Spark unit testing + Amazon EC2 launch scripts
•Amazon EMR doesn’t have the latest stable versions
Architecture—4thiteration

Architecture evolution
-
20
40
60
80
100
120
140
160
0
20
40
60
80
100
120
#1
#2
#3
#4
Active Customers
Costs
Customers

Lessons learned
•Automatesince Day 1 (CloudFormation + Chef)
•Monitor systems activity, understand your data patterns, e.g. LogStash(ELK)
•Always have a Source of Truth (Amazon S3 + Glacier)
•Make your Source of Truth searchable

Lessons Learned (II)
•Approximation is a good thing: HLL, CMS, Bloom
•Write your pipelines considering reprocessingneeds
•Avoidat all costs framework explosion
•AWS ecosystem allows rapid prototype

Architecture nextgen
•Reduce moving parts
•Apache Spark as central processing framework
–Realtime(Micro-batch)
–Batch-processing
•Kafka or Amazon Kinesis(Message Broker)
•Cassandra(Time-series storage)
•ElasticSearch(Content Indexer)

To infinity …
and beyond! Architecture evolution
0
20
40
60
80
100
120
#1 #2 #3 #4 NextGen
Active Customers

Gustavo Arjones, CTO
@arjones | gustavo@socialmetrix.com
Sebastian Montini, Solutions Architect
@sebamontini | sebastian@socialmetrix.com
Feedbackand QandA

(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to (ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014

Similar to (ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014 (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014