Organizations processing mission critical high-volume data must be able to achieve high levels of throughput and durability in data processing workflows. In this session, we will learn how DataXu is using Amazon Kinesis, Amazon S3, and Amazon EMR for its patented approach to programmatic marketing. Every second, the DataXu Marketing Cloud processes over 1 Million ad requests and makes more than 40 billion decisions to select and bid on ad impressions that are most likely to convert. In addition to addressing the scalability and availability of the platform, we will explore Amazon Kinesis producer and consumer applications that support high levels of scalability and durability in mission-critical record processing.
3. Big data
•Hourly server logs: were your systems misbehaving 1hr ago
•Weekly / Monthly Bill: what you spent this billing cycle
•Daily customer-preferences report from your web site’s click stream: what deal or ad to try next time
•Daily fraud reports: was there fraud yesterdaywhat went wrong now: prevent overspendingnowwhat to offer the current customer nowblock fraudulent use now
24. Amazon Kinesis storage is replicated across
Availability Zones
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates data
across three data centers (availability zones)
Aggregate and
archive to S3
Millions of
sources producing
100s of terabytes
per hour
Front
End
Authentication
Authorization
Ordered stream
of events supports
multiple readers
Real-time
dashboards
and alarms
Machine learning
algorithms or
sliding window
analytics
Aggregate analysis
in Hadoop or a
data warehouse
Inexpensive: $0.028 per million puts
37. •Unordered processing
–Randomize partition key to distribute events over many shards and use multiple workers
•Exact order processing
–Control the partition key to ensure events are grouped onto the same shard and read by the same worker.
•Need both? Get global sequence number
Producer
Get Global Sequence
UnorderedStream
Campaign Centric Stream
Fraud Inspection Stream
Get Event Metadata
Id
event
Stream –partition key
1
confirmation
Campaign-centric stream -UUID
2
fraud
Unordered StreamFraud-inspection stream –sessionid