Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Funnel Analysis with Spark and Druid

Itai Yaffe (Tech Lead, Big Data group) @ Nielsen:
Every day, millions of advertising campaigns are happening around the world.
As campaign owners, measuring the ongoing campaign effectiveness (e.g “how many distinct users saw my online ad VS how many distinct users saw my online ad, clicked it and purchased my product?”) is super important.
However, this task (often referred to as “funnel analysis”) is not an easy task, especially if the chronological order of events matters. So, while the combination of Druid and ThetaSketch aggregators can answer some of these questions, it still can’t answer the question "how many distinct users viewed the brand’s homepage FIRST and THEN viewed product X page?"
In this talk, we will discuss how we combine Spark, Druid and ThetaSketch aggregators to answer such questions at scale.

  • Be the first to comment

Funnel Analysis with Spark and Druid

  1. 1. @ItaiYaffe ● ○ ○ ○ ○
  2. 2. @ItaiYaffe ● ○ ○ ○ ○ ● ○ ○ ○
  3. 3. @ItaiYaffe ● ○
  4. 4. @ItaiYaffe ● 😉
  5. 5. @ItaiYaffe ● ● ●
  6. 6. @ItaiYaffe … ● ●
  7. 7. @ItaiYaffe ● ○ ○ ● ○ ○ ○ ●
  8. 8. @ItaiYaffe ● ● ● ● ●
  9. 9. @ItaiYaffe >10B events/day >20TB/day S3 1000’s nodes/day 10’s of TB ingested/day druid $100K’s/month
  10. 10. @ItaiYaffe
  11. 11. @ItaiYaffe
  12. 12. @ItaiYaffe
  13. 13. @ItaiYaffe
  14. 14. @ItaiYaffe ● ○ ● ● ● ○ ● ●
  15. 15. @ItaiYaffe ● ○ ● ● ● ○ ○ ○
  16. 16. @ItaiYaffe ● ○ ● ● ● ○ ○ ○
  17. 17. @ItaiYaffe Awareness Exposed to campaign (e.g via online ad) Consideration Interest is expressed (e.g clicked ad) Intent Steps taken towards making a purchase (e.g added product to cart) Purchase
  18. 18. @ItaiYaffe Awareness Exposed to campaign (e.g via online ad) Consideration Interest is expressed (e.g clicked ad) Intent Steps taken towards making a purchase (e.g added product to cart) Purchase Tactic Stages
  19. 19. @ItaiYaffe Awareness Consideration Intent Purchase Drop- off Drop- off Drop- off
  20. 20. @ItaiYaffe PRODUCT PAGE 10M UUs CHECKOUT 3M UUs HOMEPAGE 15M UUs 7M Drop-off 5M Drop-off AD EXPOSURE 100M UUs 85M Drop-off * UUs = Unique Users
  21. 21. @ItaiYaffe 2 Users 7 Views 2 Purchases $$$ $$$
  22. 22. @ItaiYaffe PRODUCT PAGE 10M CHECKOUT 3M HOMEPAGE 15M 7M Drop-off 5M Drop-off AD EXPOSURE 100M 85M Drop-off
  23. 23. @ItaiYaffe ● ● ○ ●
  24. 24. @ItaiYaffe …
  25. 25. @ItaiYaffe
  26. 26. @ItaiYaffe
  27. 27. @ItaiYaffe ● ● ● ●
  28. 28. @ItaiYaffe
  29. 29. @ItaiYaffe
  30. 30. @ItaiYaffe ● ● ○ ○ ●
  31. 31. @ItaiYaffe …
  32. 32. @ItaiYaffe
  33. 33. @ItaiYaffe
  34. 34. @ItaiYaffe
  35. 35. @ItaiYaffe
  36. 36. @ItaiYaffe {event_time=2020-01-28T..., userid=uid1, attribute=online_ad} {event_time=2020-01-28T..., userid=uid1, attribute=homepage} {event_time=2020-01-28T..., userid=uid1, attribute=productX_page} ....
  37. 37. @ItaiYaffe {event_time=2020-01-28T... , userid=uid1, attribute=online_ad, type=Tactic} {event_time=2020-01-28T... , userid=uid1, attribute=homepage, type=Stage} {event_time=2020-01-28T... , userid=uid1, attribute=productX_page , type=Stage} ....
  38. 38. @ItaiYaffe {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=homepage} {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=productX_page } .... ....
  39. 39. @ItaiYaffe "type": "index_hadoop", "spec": { "dataSchema": { "dataSource": "campaign_1472", "granularitySpec": { "queryGranularity": "day", "segmentGranularity": "day", "type": "uniform", "intervals": ["2020-01-01/2020-01-29"] ...
  40. 40. @ItaiYaffe "timestampSpec": { "column": "event_date", "format": "yyyy-MM-dd" }, "dimensionsSpec": { "dimensions": ["tactic", "stage"] }, "metricsSpec": [{ "fieldName": "userid", "type": "thetaSketch", "name": "user_id_sketch", "size": 65536}], ...
  41. 41. @ItaiYaffe "inputSpec": {"type": " multi", "children": [ {"type": " dataSource", "ingestionSpec": { "intervals": ["2020-01-01/2020-01-29"], "dataSource": "campaign_1472", ...}}, {"type": " static", "Paths": "s3://<BUCKET_NAME>/date=2020-01-28/campaign=1472", ...}, ...
  42. 42. @ItaiYaffe {__time=2020-01-28, tactic=online_ad, stage=homepage, user_id_sketch=<Object>} {__time=2020-01-28, tactic=online_ad, stage=productX_page , user_id_sketch=<Object>} .... ....
  43. 43. @ItaiYaffe {"filter":{"type":"and","fields":[{"type":"or","fields":[ {"type":"selector","dimension":"stage","value":"homepage" }]},{"type":"or","fields":[{"type":"selector","dimension" :"tactic","value":"online_ad"}]}]},"intervals":["2020-01- 01T00:00:00.000/2020-01-29T23:59:59.000"],"granularity":" ALL","dataSource":"campaign_1472","aggregations":[{"filte r":{"type":"selector","dimension":"stage","value":"homepa ge"},"aggregator":{"fieldName":"user_id_sketch","size":65 536,"name":"homepage_sketch","type":"thetaSketch"},"type" :"filtered"}],"queryType":"groupBy","dimensions":[]}
  44. 44. @ItaiYaffe SELECT APPROX_COUNT_DISTINCT_DS_THETA(user_id_sketch,65536) as homepage_sketch FROM campaign_1472 WHERE (("tactic" = 'online_ad') AND ("stage" = 'homepage')) AND __time BETWEEN '2020-01-01T00:00:00.000' AND '2020-01-29T23:59:59.000'
  45. 45. @ItaiYaffe
  46. 46. @ItaiYaffe
  47. 47. @ItaiYaffe
  48. 48. @ItaiYaffe
  49. 49. @ItaiYaffe PRODUCT PAGE 1K UUs ... HOMEPAGE 3.1K UUs 2.5K Drop-off ONLINE AD 8.1M UUs * UUs = Unique Users
  50. 50. @ItaiYaffe PRODUCT PAGE 1K UUs ... HOMEPAGE 3.1K UUs 2.5K Drop-off ONLINE AD 8.1M UUs * UUs = Unique Users
  51. 51. @ItaiYaffe
  52. 52. @ItaiYaffe ● ○ ● … ○ ●
  53. 53. @ItaiYaffe
  54. 54. @ItaiYaffe {event_time=2020-01-28T09:15, userid=uid1, attribute=productX_page} {event_time=2020-01-28T10:10, userid=uid1, attribute=online_ad} {event_time=2020-01-28T10:11, userid=uid1, attribute=homepage} ....
  55. 55. @ItaiYaffe {event_time=2020-01-28T09:15 , userid=uid1, attribute=productX_page , type=Stage} {event_time=2020-01-28T10:10 , userid=uid1, attribute=online_ad, type=Tactic} {event_time=2020-01-28T10:11 , userid=uid1, attribute=homepage, type=Stage} ....
  56. 56. @ItaiYaffe {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=productX_page } {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=homepage} .... ....
  57. 57. @ItaiYaffe {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=productX_page } {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=homepage} .... ....
  58. 58. @ItaiYaffe {event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=homepage} .... ....
  59. 59. @ItaiYaffe {"filter":{"type":"and","fields":[{"type":"or","fields":[{"type":"selector","dimension":"stage","value":"homepage"},{"type":"selecto r","dimension":"stage","value":"productX_page"},{"type":"selector","dimension":"stage","value":"add_to_cart"},{"type":"selector","di mension":"stage","value":"checkout"}]},{"type":"or","fields":[{"type":"selector","dimension":"tactic","value":"online_ad"}]}]},"inte rvals":["2018-12-06T00:00:00.000/2020-01-29T23:59:59.000"],"granularity":"ALL","dataSource":"campaign_974","aggregations":[{"filter" :{"type":"selector","dimension":"stage","value":"homepage"},"aggregator":{"fieldName":"user_id_sketch","size":65536,"name":"A","type ":"thetaSketch"},"type":"filtered"},{"filter":{"type":"selector","dimension":"tactic","value":"online_ad"},"aggregator":{"fieldName" :"user_id_sketch","size":65536,"name":"B","type":"thetaSketch"},"type":"filtered"},{"filter":{"type":"selector","dimension":"stage", "value":"productX_page"},"aggregator":{"fieldName":"user_id_sketch","size":65536,"name":"C","type":"thetaSketch"},"type":"filtered"} ,{"filter":{"type":"selector","dimension":"stage","value":"add_to_cart"},"aggregator":{"fieldName":"user_id_sketch","size":65536,"na me":"D","type":"thetaSketch"},"type":"filtered"},{"filter":{"type":"selector","dimension":"stage","value":"checkout"},"aggregator":{ "fieldName":"user_id_sketch","size":65536,"name":"E","type":"thetaSketch"},"type":"filtered"}],"postAggregations":[{"field":{"func": "NOT","size":65536,"name":"(homepage AND online_ad AND ( NOT (productX_page OR add_to_cart OR checkout)))","type":"thetaSketchSetOp","fields":[{"func":"INTERSECT","size":65536,"name":"(homepage AND online_ad AND ( NOT (productX_page OR add_to_cart OR checkout)))","type":"thetaSketchSetOp","fields":[{"fieldName":"A","type":"fieldAccess"},{"fieldName":"B","type":"fieldAccess"}]},{"f unc":"UNION","size":65536,"name":"(productX_page OR add_to_cart OR checkout)","type":"thetaSketchSetOp","fields":[{"fieldName":"C","type":"fieldAccess"},{"fieldName":"D","type":"fieldAccess"},{"field Name":"E","type":"fieldAccess"}]}]},"name":"online_ad_596","type":"thetaSketchEstimate"}],"queryType":"groupBy","dimensions":[]}
  60. 60. @ItaiYaffe SELECT THETA_SKETCH_NOT(65536, THETA_SKETCH_INTERSECT(65536,a,b), THETA_SKETCH_UNION(65536,c,d,e) ) as online_ad_596 FROM ( SELECT DS_THETA("user_id_sketch") FILTER (WHERE stage = 'homepage') as a, DS_THETA("user_id_sketch") FILTER (WHERE tactic = 'online_ad') as b, DS_THETA("user_id_sketch") FILTER (WHERE stage = 'productX_page') as c, DS_THETA("user_id_sketch") FILTER (WHERE stage = 'add_to_cart') as d, DS_THETA("user_id_sketch") FILTER (WHERE stage = 'checkout') as e FROM campaign_1472 WHERE stage in ('homepage','productX_page','checkout','add_to_cart') AND tactic = 'online_ad') subquery
  61. 61. @ItaiYaffe
  62. 62. @ItaiYaffe
  63. 63. @ItaiYaffe
  64. 64. @ItaiYaffe
  65. 65. @ItaiYaffe PRODUCT PAGE 0.6K UUs ... HOMEPAGE 3.1K UUs 2.5K Drop-off ONLINE AD 8.1M UUs * UUs = Unique Users
  66. 66. @ItaiYaffe PRODUCT PAGE 0.6K UUs ... HOMEPAGE 3.1K UUs 2.5K Drop-off ONLINE AD 8.1M UUs * UUs = Unique Users
  67. 67. @ItaiYaffe ● ○ ● ○ ○ ● ○ ○
  68. 68. @ItaiYaffe ● ○ ○ ● ○ ○ ● ○ ○
  69. 69. @ItaiYaffe ● ○ ■ ■ ○ ○ ● ○ ○ ● ○

×