In these slides I introduce our open source ETL framework for stream-based mobile and web event tracking "Alchemist". You'll also learn how to easily and with little cost in-house all your event tracking thanks to some AWS tools.
2. About me
• Sebastian Schleicher, Director of Engineering @Blinkist
https://about.me/sebastian.schleicher
2
3. A book’s key insights distilled into
bite-sized, 15-minute reads
15-minute book summaries
A library of 2,500+ bite-sized
insights for reading and listening
Text and audio
Blinkist – Big Ideas in Small Packages
4. What’s in it for you?
• Innovative solutions for in-house web and mobile event tracking
• A stream-based approach to event tracking with AWS
• An Open Source framework that helps you wire all this together
4
5. What is Event Tracking?
• User behavior in your apps
• Click behavior / Page visits on your websites
• E-Mail openings / interactions
• Business events from your backend
• … many more
5
22. AWS Kinesis
22
Record
Ordered by time of arrival
Retained up to 7 days
Consumer
Read records
and do
something
Producer putRecords
Producer putRecords
Producer putRecords
23. AWS Kinesis Record
23
Base64 Encoded Binary
e.g. JSON String
* https://docs.aws.amazon.com/kinesis/latest/APIReference/API_Record.html
24. Streaming System on AWS
24
Collector
Kinesis
Stream
Kinesis
Firehose
Lambda
External
System
Redshift
25. AWS Kinesis Firehose
25
Producer putRecords
Redshift
Copy Records
Producer putRecords
Producer putRecords
Managed
AWS
Consumer
• Managed Kinesis Application
• Copies records to Redshift
(or other AWS data stores)
• putRecords ensures that only
JSON inserted
• Object keys get mapped to
columns in Redshift
26. Streaming System on AWS
26
Collector
Kinesis
Stream
Kinesis
Firehose
Lambda
External
System
Redshift
28. AWS Lambda
28
Lambda
• “Serverless” Application
• Simple Function Executions
• JavaScript / Java / Python / Go
S3 Event
Kinesis Event
Many More Events
29. Streaming System on AWS
29
Kinesis
Stream
Kinesis
Firehose
Lambda
External
System
Redshift
Running on Lambda
CollectorAlchemist
30. Welcome Alchemist
• Lightweight E(xtract) T(ransform) L(oad) framework written in JS
• Ideal for the usage in an AWS Lambda environment
• Many built-in adapters to extract data from AWS resources (like S3)
• Easy to extend
30
34. Alchemist Web Pipeline
34
S3 Input
Kinesis Output
Cloudfront Log
Transformation
Unzip
Transformation
Quality Control
Transformation
SQS Output
Pipeline
Data
Sane
Data
Faulty
Data
Check faulty data
and react quickly
S3 Bucket
Load the file
from S3
35. Alchemist Use Cases
35
Log Files
Web tracking
Alchemist
Kinesis
Mobile Tracking
Alchemist
Email Tracking
Alchemist
37. Alchemist Mobile Pipeline
37
Kinesis Input
Kinesis Output
Map Fields
Transformation
Quality Control
Transformation
SQS Output
Pipeline
Data
Sane
Data
Faulty
Data
Check faulty data
and react quickly
Decode64
Parse JSON
Kinesis
39. E-Mail Tracking
39
Webhooks for User opens an email User clicks a button in an email
HTTP Event
putRecords
Alchemist
Kinesis
Newsletter
Provider
API Gateway
40. Alchemist E-Mail Pipeline
40
HTTP Input
Kinesis Output
Parse Body
Transformation
Quality Control
Transformation
SQS Output
Pipeline
Data
Sane
Data
Faulty
Data
Check faulty data
and react quickly
API Gateway
52. Pricing
• We pay currently 150$ for 200 million events per month* 🤫
• Everything is fully managed and scaled by AWS 🤗
• Simple to maintain and operate as a developer 🤓
• No expensive pre-build solution required ✌
52 *Redshift not included (around 0.25$/hour per node)
54. Key takeaways
• In-house event tracking can be scalable & affordable 💸
• A central data stream containing all relevant events is awesome 🚀
• Having a simple streaming architecture rocks🤘
• Embrace the vendor lock-in – it helps you more than it hurts 🤠
54
Don’t be afraid and in-house your event tracking!
55. Thank you for your attention.
https://about.me/sebastian.schleicher