In this talk, we will talk about:
1) the motivation of switching from Hbase backed analytics system to Druid
2) the architecture design of Druid as a platform in Pinterest (Archmage, Hadoop, Kafka) including a query interface, Archmage, a thrift service in front of Druid which exposes a thrift api to company-wise clients, handles Druid broker hosts discovery, serves as a relay to broker hosts to abstract the async HTTP connection and provides query optimizations transparent to clients including directly translating fixed pattern SQL to Druid native JSON queries to save planning time. In addition, we’ll cover the production Hadoop batch and Kafka real time ingestion pipeline setup and the reason we picked a pull-based solution instead of a push-based solution for real time ingestion.
3) We will also talk about the use cases currently running in production on this platform including their data volume, QPS, Druid cluster setup, the unique challenges we met while onboarding and how we addressed them with extensive tunings to meet SLA and lessons learned for use cases including: partner insights, which provides partners with stats on organic pins; realtime spam detection, which detects user login related anomaly events and pin related spamming events like pin creation and repin; and migrating the backend from Presto to Druid for Ads related experiments data analysis.
25. Time for questions
@Pinterest
25
Thank you!
Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
26. Dates: November 10, 2020
druidsummit.org
26
Register Now for
the Next Druid
Virtual Summit