Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid

Itai Yaffe (Tech Lead, Big Data group) @ Nielsen:
Every day, millions of advertising campaigns are happening around the world.
As campaign owners, measuring the ongoing campaign effectiveness (e.g "how many distinct users saw my online ad VS how many distinct users saw my online ad, clicked it and purchased my product?") is super important.
However, this task (often referred to as "funnel analysis") is not an easy task, especially if the chronological order of events matters.
So, while the combination of Druid and ThetaSketch aggregators can answer some of these questions, it still can’t answer the question "how many distinct users viewed the brand’s homepage FIRST and THEN viewed product X page?"
In this talk, we will discuss how we combine Spark, Druid and ThetaSketch aggregators to answer such questions at scale.

  • Be the first to comment

  • Be the first to like this

DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid

  1. 1. @ItaiYaffe ●
  2. 2. @ItaiYaffe ● ●
  3. 3. @ItaiYaffe ● ● ●
  4. 4. @ItaiYaffe ● ● ●
  5. 5. @ItaiYaffe PRODUCT PAGE 10M CHECKOUT 3M HOMEPAGE 15M 7M Drop-off 5M Drop-off AD EXPOSURE 100M 85M Drop-off
  6. 6. @ItaiYaffe PRODUCT PAGE 10M CHECKOUT 3M HOMEPAGE 15M 7M Drop-off 5M Drop-off AD EXPOSURE 100M 85M Drop-off
  7. 7. @ItaiYaffe ● ● ● ●
  8. 8. @ItaiYaffe ● ● ● ● ●
  9. 9. @ItaiYaffe >10B events/day >20TB/day S3 1000’s nodes/day 10’s of TB ingested/day druid $100K’s/month
  10. 10. @ItaiYaffe ● ○ ○ ● ○ ○ ● ●
  11. 11. @ItaiYaffe Awareness Exposed to campaign (e.g via online ad) Consideration Interest is expressed (e.g clicked ad) Intent Steps taken towards making a purchase (e.g added product to cart) Purchase
  12. 12. @ItaiYaffe Awareness Exposed to campaign (e.g via online ad) Consideration Interest is expressed (e.g clicked ad) Intent Steps taken towards making a purchase (e.g added product to cart) Purchase Tactic Stages
  13. 13. @ItaiYaffe Awareness Consideration Intent Purchase Drop- off Drop- off Drop- off
  14. 14. @ItaiYaffe PRODUCT PAGE 10M UUs CHECKOUT 3M UUs HOMEPAGE 15M UUs 7M Drop-off 5M Drop-off AD EXPOSURE 100M UUs 85M Drop-off * UUs = Unique Users
  15. 15. @ItaiYaffe PRODUCT PAGE 10M UUs CHECKOUT 3M UUs HOMEPAGE 15M UUs 7M Drop-off 5M Drop-off AD EXPOSURE 100M UUs 85M Drop-off * UUs = Unique Users
  16. 16. @ItaiYaffe 2 Unique Users 7 Views 2 Purchases $$$ $$$
  17. 17. @ItaiYaffe PRODUCT PAGE 10M CHECKOUT 3M HOMEPAGE 15M 7M Drop-off 5M Drop-off AD EXPOSURE 100M 85M Drop-off
  18. 18. @ItaiYaffe ● ● ○ ●
  19. 19. @ItaiYaffe …
  20. 20. @ItaiYaffe
  21. 21. @ItaiYaffe ● ● ● ● ● ○
  22. 22. @ItaiYaffe
  23. 23. @ItaiYaffe
  24. 24. @ItaiYaffe
  25. 25. @ItaiYaffe ● ○ ● ● ● ○ ● ●
  26. 26. @ItaiYaffe ● ○ ● ● ● ○ ○ ○
  27. 27. @ItaiYaffe ● ○ ● ● ● ○ ○ ○
  28. 28. @ItaiYaffe
  29. 29. @ItaiYaffe
  30. 30. @ItaiYaffe ● ● ● ●
  31. 31. @ItaiYaffe
  32. 32. @ItaiYaffe
  33. 33. @ItaiYaffe ● ● ○ ● ○ ○ ●
  34. 34. @ItaiYaffe …
  35. 35. @ItaiYaffe
  36. 36. @ItaiYaffe
  37. 37. @ItaiYaffe
  38. 38. @ItaiYaffe
  39. 39. @ItaiYaffe {event_time=2020-01-28T..., userid=uid1, attribute=online_ad} {event_time=2020-01-28T..., userid=uid1, attribute=homepage} {event_time=2020-01-28T..., userid=uid1, attribute=productX_page} ....
  40. 40. @ItaiYaffe {event_time=2020-01-28T... , userid=uid1, attribute=online_ad, type=Tactic} {event_time=2020-01-28T... , userid=uid1, attribute=homepage, type=Stage} {event_time=2020-01-28T... , userid=uid1, attribute=productX_page , type=Stage} ....
  41. 41. @ItaiYaffe {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=homepage} {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=productX_page } .... ....
  42. 42. @ItaiYaffe "type": "index_hadoop", "spec": { "dataSchema": { "dataSource": "campaign_1472", "granularitySpec": { "queryGranularity": "day", "segmentGranularity": "day", "type": "uniform", "intervals": ["2020-01-01/2020-01-29"] ...
  43. 43. @ItaiYaffe "timestampSpec": { "column": "event_date", "format": "yyyy-MM-dd" }, "dimensionsSpec": { "dimensions": ["tactic", "stage"] }, "metricsSpec": [{ "fieldName": "userid", "type": "thetaSketch", "name": "user_id_sketch", "size": 65536}], ...
  44. 44. @ItaiYaffe "inputSpec": {"type": " multi", "children": [ {"type": " dataSource", "ingestionSpec": { "intervals": ["2020-01-01/2020-01-29"], "dataSource": "campaign_1472", ...}}, {"type": " static", "Paths": "s3://<BUCKET_NAME>/date=2020-01-28/campaign=1472", ...}, ...
  45. 45. @ItaiYaffe {__time=2020-01-28, tactic=online_ad, stage=homepage, user_id_sketch=<Object>} {__time=2020-01-28, tactic=online_ad, stage=productX_page , user_id_sketch=<Object>} .... ....
  46. 46. @ItaiYaffe SELECT APPROX_COUNT_DISTINCT_DS_THETA(user_id_sketch,65536) as homepage_sketch FROM campaign_1472 WHERE (("tactic" = 'online_ad') AND ("stage" = 'homepage')) AND __time BETWEEN '2020-01-01T00:00:00.000' AND '2020-01-29T23:59:59.000'
  47. 47. @ItaiYaffe
  48. 48. @ItaiYaffe
  49. 49. @ItaiYaffe
  50. 50. @ItaiYaffe
  51. 51. @ItaiYaffe PRODUCT PAGE 1K UUs ... HOMEPAGE 3.1K UUs 2.5K Drop-off ONLINE AD 8.1M UUs * UUs = Unique Users
  52. 52. @ItaiYaffe PRODUCT PAGE 1K UUs ... HOMEPAGE 3.1K UUs 2.5K Drop-off ONLINE AD 8.1M UUs * UUs = Unique Users
  53. 53. @ItaiYaffe
  54. 54. @ItaiYaffe ● ○ ● … ○ ●
  55. 55. @ItaiYaffe
  56. 56. @ItaiYaffe {event_time=2020-01-28T09:15, userid=uid1, attribute=productX_page} {event_time=2020-01-28T10:10, userid=uid1, attribute=online_ad} {event_time=2020-01-28T10:11, userid=uid1, attribute=homepage} ....
  57. 57. @ItaiYaffe {event_time=2020-01-28T09:15 , userid=uid1, attribute=productX_page , type=Stage} {event_time=2020-01-28T10:10 , userid=uid1, attribute=online_ad, type=Tactic} {event_time=2020-01-28T10:11 , userid=uid1, attribute=homepage, type=Stage} ....
  58. 58. @ItaiYaffe {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=productX_page } {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=homepage} .... ....
  59. 59. @ItaiYaffe {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=productX_page } {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=homepage} .... ....
  60. 60. @ItaiYaffe {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=homepage} .... ....
  61. 61. @ItaiYaffe SELECT THETA_SKETCH_NOT(65536, THETA_SKETCH_INTERSECT(65536,a,b), THETA_SKETCH_UNION(65536,c,d,e) ) as online_ad_596 FROM ( SELECT DS_THETA("user_id_sketch") FILTER (WHERE stage = 'homepage') as a, DS_THETA("user_id_sketch") FILTER (WHERE tactic = 'online_ad') as b, DS_THETA("user_id_sketch") FILTER (WHERE stage = 'productX_page') as c, DS_THETA("user_id_sketch") FILTER (WHERE stage = 'add_to_cart') as d, DS_THETA("user_id_sketch") FILTER (WHERE stage = 'checkout') as e FROM campaign_1472 WHERE stage in ('homepage','productX_page','checkout','add_to_cart') AND tactic = 'online_ad') subquery
  62. 62. @ItaiYaffe
  63. 63. @ItaiYaffe
  64. 64. @ItaiYaffe
  65. 65. @ItaiYaffe
  66. 66. @ItaiYaffe PRODUCT PAGE 0.6K UUs ... HOMEPAGE 3.1K UUs 2.5K Drop-off ONLINE AD 8.1M UUs * UUs = Unique Users
  67. 67. @ItaiYaffe PRODUCT PAGE 0.6K UUs ... HOMEPAGE 3.1K UUs 2.5K Drop-off ONLINE AD 8.1M UUs * UUs = Unique Users
  68. 68. @ItaiYaffe ● ○ ● ○ ○ ● ○ ○
  69. 69. @ItaiYaffe ● ○ ○
  70. 70. @ItaiYaffe ● ○ ○ ● ○ ○
  71. 71. @ItaiYaffe ● ○ ○ ● ○ ○ ● ○ ○
  72. 72. @ItaiYaffe ● ○ ■ ■ ○ ○ ● ○ ● ○

×