Working with big volumes of data is a complicated task, but it's even harder if you have to do everything in real time and try to figure it all out yourself. This session will use practical examples to discuss architectural best practices and lessons learned when solving real-time social media analytics, sentiment analysis, and data visualization decision-making problems with AWS. Learn how you can leverage AWS services like Amazon RDS, AWS CloudFormation, Auto Scaling, Amazon S3, Amazon Glacier, and Amazon Elastic MapReduce to perform highly performant, reliable, real-time big data analytics while saving time, effort, and money. Gain insight from two years of real-time analytics successes and failures so you don't have to go down this path on your own.
2. •SaaS Company—since 2008
•Social media analytics track and measure activity of brands and personality, providing information to market research and brand comparison
•Multilanguagetechnology(English, Portuguese, and Spanish)
•Leader in Latin America, with operations in 5 countries, customers in Latin Americaand US
•1 out of 34 Twitter Certified Program worldwide
10. Challenges: Velocity
•Updates every second
•Top users, top hashtags each minute
•After event analysis are made with batch over complete dataset
•Spikes of 20,000+ tweets per minute
18. Architecture—1stiteration
What we did:
•All-in-one approach
•Multi-instance architecture
•Simple vertical scalability
•MySQL performance tuning
19. Architecture—1stiteration
What we've learned:
•Multi-instance is harder to administrate, but minimizes instability impact on customers
•Vertical scalability: poor resource management
•MySQL schema changes translate into downtime
20. Architecture—2nditeration
What we needed:
•Separation of responsibilities (crawling, processing)
•Horizontal scalability
•Fast provisioning
•Cost reduction
21. Architecture—2nditeration
What we changed:
•Migrated to AWS
•RabbitMQ (Single Node)
•Replace MySQL for Amazon RDS
•AWS CloudFormation
•Auto Scaling groups
22. Architecture—2nditeration
What we've learned:
•PIOPS
•Tuning theAuto Scaling policiescan be hard
•AWS CloudFormation: great for migration, not enough for daily ops
23. Architecture—3rditeration
What we needed:
•Delivernew features (NRT, more complex analytics)
•Scalefast
•Be resilient against failure
•Addingand improvingdata sources
•Keepcosts under control (always)
24. Architecture—3rditeration
What we changed:
•Apache Storm
•RabbitMQ HA
•Amazon ElasticMapReduce (Hadoop/Hive)
•AWS CloudFormation + Chef
•Amazon Glacier + Amazon S3 lifecyclespolicies
25. Architecture—3rditeration
What we've learned:
•Spot Instances+ ReservedInstances
•Hive= SQL SQL scripts are hard to test
•BulkupsertsonAmazon RDS can be expensive (PIOPS)
•Amazon DynamoDB is great, but expensive (for our use-case)
27. Architecture—4thiteration
What we needed:
•Monitor millions of social media profiles
•Make data accessible (exploration, PoC)
•Improve UI response times
•Testing our data pipelines
•Reprocessing (faster)
33. Lessons learned
•Automatesince Day 1 (CloudFormation + Chef)
•Monitor systems activity, understand your data patterns, e.g. LogStash(ELK)
•Always have a Source of Truth (Amazon S3 + Glacier)
•Make your Source of Truth searchable
34. Lessons Learned (II)
•Approximation is a good thing: HLL, CMS, Bloom
•Write your pipelines considering reprocessingneeds
•Avoidat all costs framework explosion
•AWS ecosystem allows rapid prototype