SlideShare a Scribd company logo
1 of 26
Download to read offline
Archmage, Pinterest’s
Real-time Analytics
Platform on Druid
October 2020
Jian Wang, Tech Lead, Pinterest
Jiaqi Gu, Software Engineer, Pinterest
1
3© 2020 Pinterest. All rights reserved.
Agenda
Motivation
Challenges
Use cases
Cluster stats
Architecture
Learnings
1
2
3
4
5
6
4© 2020 Pinterest. All rights reserved.
Motivation
● Cons of Hbase based precomputed key value look up system
○ Key value data model doesn’t fit into analytics query pattern
○ Cardinality explosion anytime a new column is added
○ Impossible to precompute all filter combinations
○ More work is needed on the application side to do aggregation
We want a better system as demand for Pinterest’s analytics
use cases increase...
Why do we replace Hbase with Druid for analytics use
cases?
Example key value model:
country=usa,device=iphone,gender=male,click=123
country=china,device=iphone,gender=female,click=456
country=japan,device=android,gender=male,click=789
country=usa,device=iphone,gender=female,click=135
5© 2020 Pinterest. All rights reserved.
Challenges
What are the unique challenges of onboarding to use
Druid in Pinterest?
● Clients expects low latency on par to key value store
○ Migrated from a Hbase based key value lookup backend, clients expects
latency to stay at lower 100ms while vanilla Druid only guarantees
subseconds/seconds latency
● Pinterest scale data volume
○ Largest batch use case: 300 TB with seconds SLA
○ Largest real time use case: 500k write QPS with SLA requirement of 500
query QPS and 200 ms p99
● Cost effective
○ We want the lowest cost for the best performance possible
6© 2020 Pinterest. All rights reserved.
Use cases
Many of company’s analytics use cases are powered by
Druid
● Partner and advertiser reporting
○ Provides stats on board/pins impressions, clicks, saves, etc.
7© 2020 Pinterest. All rights reserved.
Use cases
Many of company’s analytics use cases are powered by
Druid
● Realtime spam detection
○ Detects spamming events for user login and pin operations
8© 2020 Pinterest. All rights reserved.
Use cases
Many of company’s analytics use cases are powered by
Druid
● Partner and advertiser reporting
○ Stats on board/pins impressions, clicks, saves, etc.
● Realtime spam detection
○ Detects spamming events for user login and pin operations
● Experiment metrics
○ AB testing experiment metrics
● Ads delivery debugger
○ Debugging tool for Ads delivery status
● And many more ...
9© 2020 Pinterest. All rights reserved.
Cluster stats
We have both online and offline use clusters
● Biggest online use cluster
○ 200 r4.8x historical nodes hosting 32TB, 50 i3.2x hosting 100TB
○ QPS 250
○ Query P99 ranges from 100ms to ~1.5s depending on use cases
● Biggest offline use cluster
○ 160 i3en.2x historical nodes hosting 280TB
○ QPS < 1
○ P99 2s
10© 2020 Pinterest. All rights reserved.
Architecture
Batch ingestion
Real-time ingestion
Archmage
11© 2020 Pinterest. All rights reserved.
Architecture
Archmage
● Proxy service
○ A thrift service that acts as a proxy between clients and druid to ease
integration with other services in Pinterest
○ Handles druid service discovery by watching broker znode on Druid
zookeeper
○ Thrift to HTTP/HTTP to thrift request/response translation
○ Metrics reporting
○ Speculative execution
○ Query optimization and rewriting
○ Shadow cluster dark traffic routing
12© 2020 Pinterest. All rights reserved.
Architecture
Query
● Thrift API
○ Clients send a thrift request with a SQL field to Archmage who does the
forwarding to Druid
● UI
○ Individual clients’ use case specific UI
○ Internal UI with SQL editor tool for ad-hoc queries
○ Apache Superset for dashboarding
13© 2020 Pinterest. All rights reserved.
Architecture
Ingestion
● Batch ingestion
○ Hadoop: extracted library which bypassed Druid locking
○ Reads input from s3 and writes Druid segment files on s3
● Real time ingestion
○ Kafka: exactly-once-delivery
○ Evaluated push-based Tranquility library but deprecated
14© 2020 Pinterest. All rights reserved.
Learnings
Tiered setup
● Need disk access? Look for host types with good 4KB page size
random read IOPS
○ Disk is needed when segments are not accessed often or simply the data volume
is so large thus too expensive to have a full in memory setup
○ Druid uses mmap and abstracts a segment into a byte array. Only specific portion
of the byte array is loaded from disk (e.g., for a certain column) during query time
and the loading is done in 4KB pages which means a host type (excluding process
memory) with 256G RAM behaves pretty much the same as one with 1G RAM if 1)
the 4KB page size random read IOPS are the same 2) you expect scan different
segments for each query
○ For AWS, host types with on-instance SSD work the best: i3 > i3en >> other
instance types attaching an EBS disk
15© 2020 Pinterest. All rights reserved.
Learnings
Tiered setup
● Recent data? All in memory
○ Recent data is expected to be queried more often so we want to avoid
query time disk I/O by caching all data in page cache
○ Put most recent segments (e.g., last 3 months) into memory intensive
instance types with 1:1 RAM/disk ratio: r5.8x with attached EBS
○ Background threads in historical nodes to read segment files (equivalent
to `cat 0000.smoosh > /dev/null`) on server bootstrap and new segment
downloading to force OS to load into page cache to avoid query time on
demand loading
○ The exact period of “recent” is recommended to be figured out through
request analysis. Druid real time ingestion is a good choice.
16© 2020 Pinterest. All rights reserved.
Learnings
Middle managers
● Need as much intention in tuning as historical nodes
○ Monitor metrics on Kafka ingestion offset and timestamp lag
○ Increase intermediatePersistPeriod if you are sensitive to query latency
on middle managers
○ Use a custom partitioner on Kafka producer side to improve data locality
○ Use lateMessageRejectionPeriod and earlyMessageRejectionPeriod to
avoid scattered late and early events to create a lot of small segments
○ Reindexing (compaction) jobs
○ Be careful not to use Kafka transaction on producer side prior to Druid
0.15
17© 2020 Pinterest. All rights reserved.
Learnings
Group by queries
● Tail latency
○ Many are convertible to top N if you add a limit clause
○ Add a combined dimension if group by dimensions are more than 2 but
fixed
○ Enable push limit down to sacrifice some accuracy for performance
○ Enable parallel broker side merge
○ Limit number of rows to do group by if possible from the application side
○ Make sure you have enough merge buffers to not run out them
18© 2020 Pinterest. All rights reserved.
Learnings
Secondary dimension query time pruning other than time
● Cluster computing resource is limited
○ Each segment is processed in one processing thread whose number is
usually identical to number of cores
○ Cores are the expensive and are always fewer than number of segments
○ We should be cautious on which segments to scan for a query
● Shard specs with query time partition dimensions pruning
○ Batch ingestion
■ Hash based shard spec
■ Even size single dimension shard spec
○ Real time ingestion
■ Stream hash based shard spec
19© 2020 Pinterest. All rights reserved.
Learnings
Secondary dimension query time pruning other than time
● Shard specs with query time partition dimensions pruning
○ Batch ingestion
■ Hash based shard spec
● Worked well in most use cases
● Added missing query time pruning based on hashing and
partition dimensions
● However: skewed data which leads to skewed segment size,
long ingestion tail latency and query performance issue
■ Even size single dimension shard spec
20© 2020 Pinterest. All rights reserved.
Learnings
Secondary dimension query time pruning other than time
● Shard specs with query time partition dimensions pruning
○ Batch ingestion
■ Hash based shard spec
■ Even size single dimension shard spec
● Default single dimension shard spec will fit data for the same partition
dimension value into a single segment
● Added a custom partitioner to distribute data for skewed partition dimension
value to multiple segments
● Replaced the two very slow hadoop jobs (roll up input and calculate per
partition dimension value number of rows to decide partition) with reading
output from a SparkSQL job
21© 2020 Pinterest. All rights reserved.
Learnings
Secondary dimension query time pruning other than time
● Shard specs with query time partition dimensions pruning
○ Realtime
■ Stream hash based shard spec
● Real time ingestion defaults to use numbered shard spec which doesn’t have
metadata on what data is in it which means every query has all segment fanout,
making it very hard to support high query QPS
● The stream hashed shard spec is a real time version of batch Hash based shard
spec
● Let Kafka producer puts records to different kafka partition id based on:
hash(partition dimensions) % number of kafka partitions
● Cons: this approach doesn’t allow increasing kafka partitions which will lead to
incorrect results during the transition period
22© 2020 Pinterest. All rights reserved.
Learnings
Operation tips
● druid.broker.select.tier and druid.server.priority
○ Controls routing for dark reads, Druid config AB testing and no downtime deploy
23© 2020 Pinterest. All rights reserved.
Learnings
Operation tips
● skipCoordinatorRun
○ Use this runtime config when deploy/restart historical nodes to avoid coordinator
triggering unnecessary segments movements
● maxSegmentsInNodeLoadingQueue and maxSegmentsToMove
○ Segments are represented as children under a historical host znode
○ Load queue znodes not compressed
○ Be careful of hitting zk buffer limit (default to a few MBs) when loading a large
number of segments to a historical node
24© 2020 Pinterest. All rights reserved.
Time for questions
@Pinterest
25
Thank you!
Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
Dates: November 10, 2020
druidsummit.org
26
Register Now for
the Next Druid
Virtual Summit

More Related Content

What's hot

Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Vinoth Chandar
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Flink Forward
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Databricks
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkVasia Kalavri
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkFlink Forward
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...Databricks
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index FileNavis Ryu
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...Imply
 

What's hot (20)

Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Druid
DruidDruid
Druid
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index File
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
 

Similar to Archmage, Pinterest’s Real-time Analytics Platform on Druid

AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017Zhenxiao Luo
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned Omid Vahdaty
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsAshish Mrig
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleDmytro Semenov
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Spark Summit
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoNathaniel Braun
 
PyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applicationsPyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applicationsCesar Cardenas Desales
 

Similar to Archmage, Pinterest’s Real-time Analytics Platform on Druid (20)

AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data Platforms
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
RubiX
RubiXRubiX
RubiX
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
PyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applicationsPyConIE 2017 Writing and deploying serverless python applications
PyConIE 2017 Writing and deploying serverless python applications
 
Netty training
Netty trainingNetty training
Netty training
 

More from Imply

Pivot 2.0 - The next generation visualization tool for your streaming data
Pivot 2.0 - The next generation visualization tool for your streaming dataPivot 2.0 - The next generation visualization tool for your streaming data
Pivot 2.0 - The next generation visualization tool for your streaming dataImply
 
Druid Adoption Tips and Tricks
Druid Adoption Tips and TricksDruid Adoption Tips and Tricks
Druid Adoption Tips and TricksImply
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot InstancesImply
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed ProcessesImply
 
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested DataZeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested DataImply
 
Nielsen: Casting the Spell - Druid in Practice
Nielsen: Casting the Spell - Druid in PracticeNielsen: Casting the Spell - Druid in Practice
Nielsen: Casting the Spell - Druid in PracticeImply
 
Building Data Applications with Apache Druid
Building Data Applications with Apache DruidBuilding Data Applications with Apache Druid
Building Data Applications with Apache DruidImply
 
Maximizing Apache Druid performance: Beyond the basics
Maximizing Apache Druid performance: Beyond the basicsMaximizing Apache Druid performance: Beyond the basics
Maximizing Apache Druid performance: Beyond the basicsImply
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Imply
 
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsHow TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsImply
 
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...Imply
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid Imply
 
Benchmarking Apache Druid
Benchmarking Apache DruidBenchmarking Apache Druid
Benchmarking Apache DruidImply
 
Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Imply
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsImply
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18Imply
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and RoadmapImply
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterImply
 

More from Imply (18)

Pivot 2.0 - The next generation visualization tool for your streaming data
Pivot 2.0 - The next generation visualization tool for your streaming dataPivot 2.0 - The next generation visualization tool for your streaming data
Pivot 2.0 - The next generation visualization tool for your streaming data
 
Druid Adoption Tips and Tricks
Druid Adoption Tips and TricksDruid Adoption Tips and Tricks
Druid Adoption Tips and Tricks
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot Instances
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed Processes
 
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested DataZeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
 
Nielsen: Casting the Spell - Druid in Practice
Nielsen: Casting the Spell - Druid in PracticeNielsen: Casting the Spell - Druid in Practice
Nielsen: Casting the Spell - Druid in Practice
 
Building Data Applications with Apache Druid
Building Data Applications with Apache DruidBuilding Data Applications with Apache Druid
Building Data Applications with Apache Druid
 
Maximizing Apache Druid performance: Beyond the basics
Maximizing Apache Druid performance: Beyond the basicsMaximizing Apache Druid performance: Beyond the basics
Maximizing Apache Druid performance: Beyond the basics
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsHow TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
 
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid
 
Benchmarking Apache Druid
Benchmarking Apache DruidBenchmarking Apache Druid
Benchmarking Apache Druid
 
Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and Roadmap
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Archmage, Pinterest’s Real-time Analytics Platform on Druid

  • 1. Archmage, Pinterest’s Real-time Analytics Platform on Druid October 2020 Jian Wang, Tech Lead, Pinterest Jiaqi Gu, Software Engineer, Pinterest 1
  • 2.
  • 3. 3© 2020 Pinterest. All rights reserved. Agenda Motivation Challenges Use cases Cluster stats Architecture Learnings 1 2 3 4 5 6
  • 4. 4© 2020 Pinterest. All rights reserved. Motivation ● Cons of Hbase based precomputed key value look up system ○ Key value data model doesn’t fit into analytics query pattern ○ Cardinality explosion anytime a new column is added ○ Impossible to precompute all filter combinations ○ More work is needed on the application side to do aggregation We want a better system as demand for Pinterest’s analytics use cases increase... Why do we replace Hbase with Druid for analytics use cases? Example key value model: country=usa,device=iphone,gender=male,click=123 country=china,device=iphone,gender=female,click=456 country=japan,device=android,gender=male,click=789 country=usa,device=iphone,gender=female,click=135
  • 5. 5© 2020 Pinterest. All rights reserved. Challenges What are the unique challenges of onboarding to use Druid in Pinterest? ● Clients expects low latency on par to key value store ○ Migrated from a Hbase based key value lookup backend, clients expects latency to stay at lower 100ms while vanilla Druid only guarantees subseconds/seconds latency ● Pinterest scale data volume ○ Largest batch use case: 300 TB with seconds SLA ○ Largest real time use case: 500k write QPS with SLA requirement of 500 query QPS and 200 ms p99 ● Cost effective ○ We want the lowest cost for the best performance possible
  • 6. 6© 2020 Pinterest. All rights reserved. Use cases Many of company’s analytics use cases are powered by Druid ● Partner and advertiser reporting ○ Provides stats on board/pins impressions, clicks, saves, etc.
  • 7. 7© 2020 Pinterest. All rights reserved. Use cases Many of company’s analytics use cases are powered by Druid ● Realtime spam detection ○ Detects spamming events for user login and pin operations
  • 8. 8© 2020 Pinterest. All rights reserved. Use cases Many of company’s analytics use cases are powered by Druid ● Partner and advertiser reporting ○ Stats on board/pins impressions, clicks, saves, etc. ● Realtime spam detection ○ Detects spamming events for user login and pin operations ● Experiment metrics ○ AB testing experiment metrics ● Ads delivery debugger ○ Debugging tool for Ads delivery status ● And many more ...
  • 9. 9© 2020 Pinterest. All rights reserved. Cluster stats We have both online and offline use clusters ● Biggest online use cluster ○ 200 r4.8x historical nodes hosting 32TB, 50 i3.2x hosting 100TB ○ QPS 250 ○ Query P99 ranges from 100ms to ~1.5s depending on use cases ● Biggest offline use cluster ○ 160 i3en.2x historical nodes hosting 280TB ○ QPS < 1 ○ P99 2s
  • 10. 10© 2020 Pinterest. All rights reserved. Architecture Batch ingestion Real-time ingestion Archmage
  • 11. 11© 2020 Pinterest. All rights reserved. Architecture Archmage ● Proxy service ○ A thrift service that acts as a proxy between clients and druid to ease integration with other services in Pinterest ○ Handles druid service discovery by watching broker znode on Druid zookeeper ○ Thrift to HTTP/HTTP to thrift request/response translation ○ Metrics reporting ○ Speculative execution ○ Query optimization and rewriting ○ Shadow cluster dark traffic routing
  • 12. 12© 2020 Pinterest. All rights reserved. Architecture Query ● Thrift API ○ Clients send a thrift request with a SQL field to Archmage who does the forwarding to Druid ● UI ○ Individual clients’ use case specific UI ○ Internal UI with SQL editor tool for ad-hoc queries ○ Apache Superset for dashboarding
  • 13. 13© 2020 Pinterest. All rights reserved. Architecture Ingestion ● Batch ingestion ○ Hadoop: extracted library which bypassed Druid locking ○ Reads input from s3 and writes Druid segment files on s3 ● Real time ingestion ○ Kafka: exactly-once-delivery ○ Evaluated push-based Tranquility library but deprecated
  • 14. 14© 2020 Pinterest. All rights reserved. Learnings Tiered setup ● Need disk access? Look for host types with good 4KB page size random read IOPS ○ Disk is needed when segments are not accessed often or simply the data volume is so large thus too expensive to have a full in memory setup ○ Druid uses mmap and abstracts a segment into a byte array. Only specific portion of the byte array is loaded from disk (e.g., for a certain column) during query time and the loading is done in 4KB pages which means a host type (excluding process memory) with 256G RAM behaves pretty much the same as one with 1G RAM if 1) the 4KB page size random read IOPS are the same 2) you expect scan different segments for each query ○ For AWS, host types with on-instance SSD work the best: i3 > i3en >> other instance types attaching an EBS disk
  • 15. 15© 2020 Pinterest. All rights reserved. Learnings Tiered setup ● Recent data? All in memory ○ Recent data is expected to be queried more often so we want to avoid query time disk I/O by caching all data in page cache ○ Put most recent segments (e.g., last 3 months) into memory intensive instance types with 1:1 RAM/disk ratio: r5.8x with attached EBS ○ Background threads in historical nodes to read segment files (equivalent to `cat 0000.smoosh > /dev/null`) on server bootstrap and new segment downloading to force OS to load into page cache to avoid query time on demand loading ○ The exact period of “recent” is recommended to be figured out through request analysis. Druid real time ingestion is a good choice.
  • 16. 16© 2020 Pinterest. All rights reserved. Learnings Middle managers ● Need as much intention in tuning as historical nodes ○ Monitor metrics on Kafka ingestion offset and timestamp lag ○ Increase intermediatePersistPeriod if you are sensitive to query latency on middle managers ○ Use a custom partitioner on Kafka producer side to improve data locality ○ Use lateMessageRejectionPeriod and earlyMessageRejectionPeriod to avoid scattered late and early events to create a lot of small segments ○ Reindexing (compaction) jobs ○ Be careful not to use Kafka transaction on producer side prior to Druid 0.15
  • 17. 17© 2020 Pinterest. All rights reserved. Learnings Group by queries ● Tail latency ○ Many are convertible to top N if you add a limit clause ○ Add a combined dimension if group by dimensions are more than 2 but fixed ○ Enable push limit down to sacrifice some accuracy for performance ○ Enable parallel broker side merge ○ Limit number of rows to do group by if possible from the application side ○ Make sure you have enough merge buffers to not run out them
  • 18. 18© 2020 Pinterest. All rights reserved. Learnings Secondary dimension query time pruning other than time ● Cluster computing resource is limited ○ Each segment is processed in one processing thread whose number is usually identical to number of cores ○ Cores are the expensive and are always fewer than number of segments ○ We should be cautious on which segments to scan for a query ● Shard specs with query time partition dimensions pruning ○ Batch ingestion ■ Hash based shard spec ■ Even size single dimension shard spec ○ Real time ingestion ■ Stream hash based shard spec
  • 19. 19© 2020 Pinterest. All rights reserved. Learnings Secondary dimension query time pruning other than time ● Shard specs with query time partition dimensions pruning ○ Batch ingestion ■ Hash based shard spec ● Worked well in most use cases ● Added missing query time pruning based on hashing and partition dimensions ● However: skewed data which leads to skewed segment size, long ingestion tail latency and query performance issue ■ Even size single dimension shard spec
  • 20. 20© 2020 Pinterest. All rights reserved. Learnings Secondary dimension query time pruning other than time ● Shard specs with query time partition dimensions pruning ○ Batch ingestion ■ Hash based shard spec ■ Even size single dimension shard spec ● Default single dimension shard spec will fit data for the same partition dimension value into a single segment ● Added a custom partitioner to distribute data for skewed partition dimension value to multiple segments ● Replaced the two very slow hadoop jobs (roll up input and calculate per partition dimension value number of rows to decide partition) with reading output from a SparkSQL job
  • 21. 21© 2020 Pinterest. All rights reserved. Learnings Secondary dimension query time pruning other than time ● Shard specs with query time partition dimensions pruning ○ Realtime ■ Stream hash based shard spec ● Real time ingestion defaults to use numbered shard spec which doesn’t have metadata on what data is in it which means every query has all segment fanout, making it very hard to support high query QPS ● The stream hashed shard spec is a real time version of batch Hash based shard spec ● Let Kafka producer puts records to different kafka partition id based on: hash(partition dimensions) % number of kafka partitions ● Cons: this approach doesn’t allow increasing kafka partitions which will lead to incorrect results during the transition period
  • 22. 22© 2020 Pinterest. All rights reserved. Learnings Operation tips ● druid.broker.select.tier and druid.server.priority ○ Controls routing for dark reads, Druid config AB testing and no downtime deploy
  • 23. 23© 2020 Pinterest. All rights reserved. Learnings Operation tips ● skipCoordinatorRun ○ Use this runtime config when deploy/restart historical nodes to avoid coordinator triggering unnecessary segments movements ● maxSegmentsInNodeLoadingQueue and maxSegmentsToMove ○ Segments are represented as children under a historical host znode ○ Load queue znodes not compressed ○ Be careful of hitting zk buffer limit (default to a few MBs) when loading a large number of segments to a historical node
  • 24. 24© 2020 Pinterest. All rights reserved.
  • 25. Time for questions @Pinterest 25 Thank you! Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org. Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
  • 26. Dates: November 10, 2020 druidsummit.org 26 Register Now for the Next Druid Virtual Summit