At Nielsen Identity, we leverage Druid to provide our customers with real-time analytics tools for various use-cases, including in-flight analytics, reporting and building target audiences. The common challenge of these use-cases is counting distinct elements in real-time at scale. We’ve been using Druid to solve these problems for the past 4 years, and gained a lot of experience with it.
In this talk, we will share some of the best practices and tips we’ve gathered over the years, including:
*Data modeling
*Ingestion
*Retention and deletion
*Query optimization
50. @ItaiYaffe, @yakiro
●
○ SELECT APPROX_COUNT_DISTINCT_DS_THETA(user_id_sketch)
FROM campaign_1012
WHERE tactic = 1 AND
__time BETWEEN TIMESTAMP '2018-02-01' AND TIMESTAMP '2020-09-08'
51. @ItaiYaffe, @yakiro
●
○ SELECT APPROX_COUNT_DISTINCT_DS_THETA(user_id_sketch)
FROM campaign_1012
WHERE tactic = 1 AND
__time BETWEEN TIMESTAMP '2018-02-01' AND TIMESTAMP '2020-09-08'
○ SELECT APPROX_COUNT_DISTINCT_DS_THETA(user_id_sketch,65536)...
52. @ItaiYaffe, @yakiro
●
○ SELECT APPROX_COUNT_DISTINCT_DS_THETA(user_id_sketch)
FROM campaign_1012
WHERE tactic = 1 AND
__time BETWEEN TIMESTAMP '2018-02-01' AND TIMESTAMP '2020-09-08'
○ SELECT APPROX_COUNT_DISTINCT_DS_THETA(user_id_sketch,65536)...
○ APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])
58. Time for questions
@ItaiYaffe
@yakiro
58
Thank you!
Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.