Submit Search
Upload
BDX 2016 - Tal sliwowicz @ taboola
•
0 likes
•
397 views
Ido Shilon
Follow
Taboola road to Scale The Data perspective
Read less
Read more
Internet
Report
Share
Report
Share
1 of 23
Download now
Download to read offline
Recommended
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
Ido Shilon
Discovery: Intersection of Content and Conversion
Discovery: Intersection of Content and Conversion
Taboola
Content-Marketing-and-Discovery
Content-Marketing-and-Discovery
Ran Gishri
Communications In A Web 2.0 World - Texas State University Mass Communication...
Communications In A Web 2.0 World - Texas State University Mass Communication...
Michael Pranikoff
Evento AdTech & Data 2016 - Virtual reality and tech innovation for ads - And...
Evento AdTech & Data 2016 - Virtual reality and tech innovation for ads - And...
IAB Brasil
מצגת של עינבר יגור, סמנכ"לית אסטרטגיית תוכן בחברת טאבולה
מצגת של עינבר יגור, סמנכ"לית אסטרטגיית תוכן בחברת טאבולה
Holesinthenet
Digital Marketing: Now, New, Next
Digital Marketing: Now, New, Next
R2integrated
Digital Marketing Seminar 2016
Digital Marketing Seminar 2016
optixben
Recommended
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
Ido Shilon
Discovery: Intersection of Content and Conversion
Discovery: Intersection of Content and Conversion
Taboola
Content-Marketing-and-Discovery
Content-Marketing-and-Discovery
Ran Gishri
Communications In A Web 2.0 World - Texas State University Mass Communication...
Communications In A Web 2.0 World - Texas State University Mass Communication...
Michael Pranikoff
Evento AdTech & Data 2016 - Virtual reality and tech innovation for ads - And...
Evento AdTech & Data 2016 - Virtual reality and tech innovation for ads - And...
IAB Brasil
מצגת של עינבר יגור, סמנכ"לית אסטרטגיית תוכן בחברת טאבולה
מצגת של עינבר יגור, סמנכ"לית אסטרטגיית תוכן בחברת טאבולה
Holesinthenet
Digital Marketing: Now, New, Next
Digital Marketing: Now, New, Next
R2integrated
Digital Marketing Seminar 2016
Digital Marketing Seminar 2016
optixben
Search++: Cognitive transformation of human-system interaction: Presented by ...
Search++: Cognitive transformation of human-system interaction: Presented by ...
Lucidworks
Kevin Indig @ The family: Massive revenue without a sales-team
Kevin Indig @ The family: Massive revenue without a sales-team
Kevin Indig
"How to Creates Massive Revenue without a Sales Team” by Kevin Indig, SEO Nin...
"How to Creates Massive Revenue without a Sales Team” by Kevin Indig, SEO Nin...
TheFamily
Taboola Partners - Introduction To Taboola
Taboola Partners - Introduction To Taboola
🔥 Jonathan Riftin
Taboola Partners: Introduction to Taboola
Taboola Partners: Introduction to Taboola
Taboola
BizTech2017 Presentation
BizTech2017 Presentation
Raquel Seville
DV 2016: Delivering Better Online Customer Experiences with Offline Data
DV 2016: Delivering Better Online Customer Experiences with Offline Data
Tealium
The Female Social Network Case Study - Fresh Produce Social Media
The Female Social Network Case Study - Fresh Produce Social Media
Steven Bradley
Web marketing 101
Web marketing 101
RealGreenAnalytics
A6 big data_in_the_cloud
A6 big data_in_the_cloud
Dr. Wilfred Lin (Ph.D.)
Josh Luger - Mumbrella Keynote - October 2015
Josh Luger - Mumbrella Keynote - October 2015
Josh Luger
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...
InterCon
Empire media | find.com Prepared by @DigitalSCRM
Empire media | find.com Prepared by @DigitalSCRM
Digital SCRM
Meet up digital trends 2017
Meet up digital trends 2017
Knucklepuck Media
Content Remains King But Social is Deaf
Content Remains King But Social is Deaf
Ali Mohsen
Slides from GraphDay Santa Clara
Slides from GraphDay Santa Clara
Neo4j
Michael Pranikoff - PRSA Northeast Conference 9/11/08
Michael Pranikoff - PRSA Northeast Conference 9/11/08
Michael Pranikoff
Can Search Be Machine Learned?
Can Search Be Machine Learned?
MediaPost
Tech Talk with Quantcast: RTB: The Connected Opportunity
Tech Talk with Quantcast: RTB: The Connected Opportunity
Digiday
Leap.it angellist intro_5_14
Leap.it angellist intro_5_14
Techstars
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
Why ml and ai are the future of gaming david sachs @ tomobox
Why ml and ai are the future of gaming david sachs @ tomobox
Ido Shilon
More Related Content
Similar to BDX 2016 - Tal sliwowicz @ taboola
Search++: Cognitive transformation of human-system interaction: Presented by ...
Search++: Cognitive transformation of human-system interaction: Presented by ...
Lucidworks
Kevin Indig @ The family: Massive revenue without a sales-team
Kevin Indig @ The family: Massive revenue without a sales-team
Kevin Indig
"How to Creates Massive Revenue without a Sales Team” by Kevin Indig, SEO Nin...
"How to Creates Massive Revenue without a Sales Team” by Kevin Indig, SEO Nin...
TheFamily
Taboola Partners - Introduction To Taboola
Taboola Partners - Introduction To Taboola
🔥 Jonathan Riftin
Taboola Partners: Introduction to Taboola
Taboola Partners: Introduction to Taboola
Taboola
BizTech2017 Presentation
BizTech2017 Presentation
Raquel Seville
DV 2016: Delivering Better Online Customer Experiences with Offline Data
DV 2016: Delivering Better Online Customer Experiences with Offline Data
Tealium
The Female Social Network Case Study - Fresh Produce Social Media
The Female Social Network Case Study - Fresh Produce Social Media
Steven Bradley
Web marketing 101
Web marketing 101
RealGreenAnalytics
A6 big data_in_the_cloud
A6 big data_in_the_cloud
Dr. Wilfred Lin (Ph.D.)
Josh Luger - Mumbrella Keynote - October 2015
Josh Luger - Mumbrella Keynote - October 2015
Josh Luger
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...
InterCon
Empire media | find.com Prepared by @DigitalSCRM
Empire media | find.com Prepared by @DigitalSCRM
Digital SCRM
Meet up digital trends 2017
Meet up digital trends 2017
Knucklepuck Media
Content Remains King But Social is Deaf
Content Remains King But Social is Deaf
Ali Mohsen
Slides from GraphDay Santa Clara
Slides from GraphDay Santa Clara
Neo4j
Michael Pranikoff - PRSA Northeast Conference 9/11/08
Michael Pranikoff - PRSA Northeast Conference 9/11/08
Michael Pranikoff
Can Search Be Machine Learned?
Can Search Be Machine Learned?
MediaPost
Tech Talk with Quantcast: RTB: The Connected Opportunity
Tech Talk with Quantcast: RTB: The Connected Opportunity
Digiday
Leap.it angellist intro_5_14
Leap.it angellist intro_5_14
Techstars
Similar to BDX 2016 - Tal sliwowicz @ taboola
(20)
Search++: Cognitive transformation of human-system interaction: Presented by ...
Search++: Cognitive transformation of human-system interaction: Presented by ...
Kevin Indig @ The family: Massive revenue without a sales-team
Kevin Indig @ The family: Massive revenue without a sales-team
"How to Creates Massive Revenue without a Sales Team” by Kevin Indig, SEO Nin...
"How to Creates Massive Revenue without a Sales Team” by Kevin Indig, SEO Nin...
Taboola Partners - Introduction To Taboola
Taboola Partners - Introduction To Taboola
Taboola Partners: Introduction to Taboola
Taboola Partners: Introduction to Taboola
BizTech2017 Presentation
BizTech2017 Presentation
DV 2016: Delivering Better Online Customer Experiences with Offline Data
DV 2016: Delivering Better Online Customer Experiences with Offline Data
The Female Social Network Case Study - Fresh Produce Social Media
The Female Social Network Case Study - Fresh Produce Social Media
Web marketing 101
Web marketing 101
A6 big data_in_the_cloud
A6 big data_in_the_cloud
Josh Luger - Mumbrella Keynote - October 2015
Josh Luger - Mumbrella Keynote - October 2015
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...
Empire media | find.com Prepared by @DigitalSCRM
Empire media | find.com Prepared by @DigitalSCRM
Meet up digital trends 2017
Meet up digital trends 2017
Content Remains King But Social is Deaf
Content Remains King But Social is Deaf
Slides from GraphDay Santa Clara
Slides from GraphDay Santa Clara
Michael Pranikoff - PRSA Northeast Conference 9/11/08
Michael Pranikoff - PRSA Northeast Conference 9/11/08
Can Search Be Machine Learned?
Can Search Be Machine Learned?
Tech Talk with Quantcast: RTB: The Connected Opportunity
Tech Talk with Quantcast: RTB: The Connected Opportunity
Leap.it angellist intro_5_14
Leap.it angellist intro_5_14
More from Ido Shilon
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
Why ml and ai are the future of gaming david sachs @ tomobox
Why ml and ai are the future of gaming david sachs @ tomobox
Ido Shilon
Deep learning at nmc devin jones
Deep learning at nmc devin jones
Ido Shilon
Accelerating scale from startups to enterprise by Peter bakas
Accelerating scale from startups to enterprise by Peter bakas
Ido Shilon
Blind spots in big data erez koren @ forter
Blind spots in big data erez koren @ forter
Ido Shilon
Using druid for interactive count distinct queries at scale @ nmc
Using druid for interactive count distinct queries at scale @ nmc
Ido Shilon
Micro apps across 3 continents using React js
Micro apps across 3 continents using React js
Ido Shilon
BDX 2016 - Arnon rotem gal-oz @ appsflyer
BDX 2016 - Arnon rotem gal-oz @ appsflyer
Ido Shilon
BDX 2016- Monal daxini @ Netflix
BDX 2016- Monal daxini @ Netflix
Ido Shilon
BDX 2016 - Tzach zohar @ kenshoo
BDX 2016 - Tzach zohar @ kenshoo
Ido Shilon
Scaling to 1 million users v1
Scaling to 1 million users v1
Ido Shilon
Couchbase@live person meetup july 22nd
Couchbase@live person meetup july 22nd
Ido Shilon
More from Ido Shilon
(12)
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
Why ml and ai are the future of gaming david sachs @ tomobox
Why ml and ai are the future of gaming david sachs @ tomobox
Deep learning at nmc devin jones
Deep learning at nmc devin jones
Accelerating scale from startups to enterprise by Peter bakas
Accelerating scale from startups to enterprise by Peter bakas
Blind spots in big data erez koren @ forter
Blind spots in big data erez koren @ forter
Using druid for interactive count distinct queries at scale @ nmc
Using druid for interactive count distinct queries at scale @ nmc
Micro apps across 3 continents using React js
Micro apps across 3 continents using React js
BDX 2016 - Arnon rotem gal-oz @ appsflyer
BDX 2016 - Arnon rotem gal-oz @ appsflyer
BDX 2016- Monal daxini @ Netflix
BDX 2016- Monal daxini @ Netflix
BDX 2016 - Tzach zohar @ kenshoo
BDX 2016 - Tzach zohar @ kenshoo
Scaling to 1 million users v1
Scaling to 1 million users v1
Couchbase@live person meetup july 22nd
Couchbase@live person meetup july 22nd
Recently uploaded
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
ellan12
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
sexy call girls service in goa
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
girls4nights
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Damian Radcliffe
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
divyansh0kumar0
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girls
stephieert
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Delhi Call girls
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
Diya Sharma
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
James Anderson
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
gwenoracqe6
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
dollysharma2066
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
Thierry TROUIN ☁
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
soniya singh
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
anamikaraghav4
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Sheetaleventcompany
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
rahman018755
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
anamikaraghav4
Recently uploaded
(20)
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
BDX 2016 - Tal sliwowicz @ taboola
1.
Taboola’s Road to Scale The Data Perspec4ve Tal Sliwowicz
2.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. Tal Sliwowicz Director, R&D tal@taboola.com Who am I?
3.
You’ve Seen Us
Before! Enabling people to discover information at that moment when they’re likely to engage
4.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. Entertainment | Lifestyle Tech Our
Clients are All Around the Globe
5.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. 750M monthly unique users 100K+ Requests/sec 10B+ recommendation s/day 5TB+ Daily data REACH
PROPERTY 95.5% Google Ad Network 87.8% Taboola 86.2% Google Sites 61.5% Facebook 60.3% Yahoo Sites 56.6% Outbrain 52% mobile traffic 48% desktop traffic US desktop users reached, 12/2015 Taboola in Numbers
6.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. Context Metadata Region-based Location Recommendations User Behavior Cookie
Data Collaborative Filtering Bucketed Consumption Groups CONTENT RECOMMENDATION ENGINE Social Facebook / Twitter API The Recommenda4on Engine
7.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. Taboola’s Discovery Platform Traffic
Acquisition Business Dev.! Sponsored Content Editorial! Newsroom Sales! Native Ads Audience Dev. Product! Personalization Data & Insights!
8.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. • Events and logs (rawdata) wriPen directly to DB • Recs Are read from DB •
Crashed when CNN launched Taboola 2007 Frontend FE Server
9.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. • Same as before, but without direct write to DB • Switching to bulk load •
But – Very Basic Repor4ng, not scalable Taboola 2007.5 Frontend Bulk Load FE Server
10.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. • Introduced a semi real4me events parsing services: Session Parser and Session Analyzer • Divided analysis work by unit (session) •
Files were pushed from RecServer(s) to Backend processing • Files are gzip textual INSERT statements • But – not real 4me enough Taboola 2008 Frontend NFS Backend FE Server SessionParser SessionAnalyzer Write Summarized Data Write rawdata Read session files Read rawdata Write session files
11.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. • Made a leap towards real-4me stream processing • Unified Session Parser and Session Analyzer to an in- memory service (without going through disk) •
Made drama4c op4miza4on to memory alloca4on and data models • Failure safe architecture - can endure data delays, front-end servers’ malfunc4on • No direct DB access - key for performance, only using bulk loading for loading hourly data Taboola 2010 Frontend NFS Backend FE Server Session Parser + Analyzer Write Hourly Data (Bulk Loading) Write rawdata Read rawdata
12.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. • Mul4 DC • Roughly same architecture •
Increasing backend growth by scaling in (monster machines) • Introduced real-4me analyzers • Introduced sharding • Moved to lsync based file sync • Introduced Top Reports capabili4es Taboola 2011-2013 Frontend Lsync Backend FE Server Session Parser + Analyzer Write Hourly Data (Bulk Loading) Write rawdata Read rawdata
13.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. Taboola 2014 -
14.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. • Lots of incoming traffic (100K requests/sec) • Data (5+ TB / day): •
Personalized served recommenda4ons – per user, per page view • Events - What the user actually read and what he did • The data needs to be joined and processed in real 4me • Campaigns Management • Recommenda4ons • Billing • Reports • Etc. • The data needs to be available for offline research Our Data Requirements
15.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. Data Model Users Sessions Views Requests Items Events
16.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. • We care about sessions - chain of page views and events for a specific user • Length can be hours or even days •
We care about users – chain of sessions across sites • Length can be days or even months • Stateless Applica4on – single user data is sent from mul4ple data centers and mul4ple servers • No determinis4c affinity to a server or DC • Order isn’t guaranteed • Must be robust and automa4cally deal with late arrivals • “Exactly once” seman4cs Challenges
17.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. • Many streams of data that need to be joined (user, session, page view, widgets, recommenda4ons, events, ac4ons) • 5+TB of daily data • Research purposes require looking at full user ac4vity across 4me Challenges Cont.
18.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. Data Flow FE Servers Kana FE Consumer (Spark) C* Sessions
19.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. • Par44on key - session start hour + user bucket (0-9,999) • Clustering key - publisher_id, user_id, session_id, view_id, data_type, data_hash •
Data Type - MULTI_REQUEST, USER_EVENT, ACTION_CONVERSION, … • Data - blobs of protobuff • Results: • All the data of a single session is in one place, regardless of 4me of arrival • Idempotent process - if same message is received twice it overruns the previous arrivals due to same hash id • Sampling is built-in to the model Table Model in C*
20.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. Traffic Processor (Spark) Manual runner Next Gen. Reports Next Gen. Counters (Spark) Zeppelin BIgQuery Data Flow Cont. C* Sessions Hadoop Ver4ca
21.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. • Raw data – real 4me full access to the raw data, not just aggregated data • Week of data (~35TB) - 2 hours to analyze and report •
10 physical nodes , 320 Cores, 2.5TB memory, SSDs • Analyzing 1% sample of the users reduces this linearly (par44on key) • Analyzing a single publisher which is 1% of the data reduces this almost linearly (clustering key) • Repor4ng – minutes for availability of full repor4ng vs. hours • Suppor4ng our growth – Spark as a distributed compu4ng engine is very strong, easy to scale and extend Before vs. Ayer
22.
Copyright©2016 The Nielsen Company. Confiden4al and Proprietary. • Long term data access – Hadoop, Cassandra and BigQuery provide a solu4on we did not have before • Analy4cs engine – the move from MySQL to Ver4ca (as an MPP engine) allows us to support complex queries over very large data sets •
Algorithmic Research and Modeling – we are now capable of in depth analysis on mul4ple dimensions across long 4me periods Before vs. Ayer - Cont.
23.
Thank You! tal@taboola.com
Download now