SlideShare a Scribd company logo
1 of 35
Big Data in Online Marketing
     Predictive Analytics Innovation Summit


       Daqing Zhao, PhD
       Director SEM Analytics at Ask.com

       2/24/2012, San Diego

       ©Daqing Zhao All rights reserved
Agenda

• Overview of big data analytics
• Insights of big data modeling
• A case for preference profiles
   – Recommender for a wine seller
• Cases for behavioral profiles for predictive models
   –   Yahoo mail retention
   –   Tribal Fusion display ads impression optimization
   –   University of Phoenix student retention
   –   University of Phoenix lead optimization
• Case of Ask.com SEM algorithms

                                                           2
Daqing Zhao, PhD

• Big Data scientist with deep domain knowledge
• Academic training
   – Analyzed molecular spectra on Cray supercomputers
   – Determined, modeled, simulated molecular motions in 3D space
• Enjoy working with large data and large scale computing
• At Bank of America, led the development of a risk
  management system of global portfolio
• Worked on computational Internet marketing since 1999




                                                                    3
Big data, Big Opportunities
• Thanks to Moore’s law, on CPU, storage, network connections
• Too much data, too little knowledge




• Data, analytics changed every field
• From science, government, to commerce



                                                                4
Things computers good at

• Computers have perfect memory
   – Every page view, click, transaction, every event,…
• Good at finding a needle in a haystack
   – Identify clickers of any particular web page at some time
   – E.g., target abandoned shopping carts with promotions
• Good at trade offs among a large number of factors
   – Female, 25-34, with child < 5, Asian, earning $30K, rent,
     divorced, live in Calif., some college, Walmart, visits
     Coupons.com, Monster.com, drive Camry, …
   – Buyer of X or not?

                                                                 5
Computers make it possible

• Given data, optimize models and parameters
   – Identify reproducible patterns in the data
   – Provide a simple picture, predict events in the future
• Simulations generate future events, given
  assumptions, and current state
   – Given a set of models, how future scenario will look like,
     under given set of conditions, “what ifs”
   – Like flight simulator
• Crowd sourcing from big data and big data modeling
   – Define similarity, translations, quality, relevance

                                                                  6
Computers can’t do everything

• Data often have issues before being well analyzed
• Data often have no taxonomy and context
• Free format data, relevant information need to be
  extracted
• Computers don’t define targets, construct predictors
• Don’t know if critical predictive factors are missing
• Computers don’t have common sense
• Computers don’t have goals to achieve


                                                          7
Modeling need to scale
• Traditional predictive models take long time to build
   – Small data sets, samples expensive to collect
• Now data are cheap and models may degrade in weeks
   – Dimension of predictors are very large
   – Number of categories are large
• Human interactive model building not scalable
• Reasons for target events are complex
• Without detailed analysis, it is unclear what drives the
  event
• We need to rely on “out of sample testing” and “off the
  shelf” modeling

                                                          8
Big Data problem
• Data size larger than what databases can handle
• Terabytes of data may take hours just to scan it
• A solution requires a cloud of servers with local
  storage
   – Read, process and write intermediate results in
     parallel
   – Aggregate at the end
• Cloud computing build models in scale
• Cloud often scales linearly as number of servers

                                                       9
Cloud computing
• We built a SAS cloud at University of Phoenix
   – I have an invited SAS talk available at SAS web site
   – We can process billions of impressions in minutes
• Hadoop clouds are used widely
   – Open source software
   – Commodity servers and storage
• Clouds may have 100Ks of servers
   – Find needle in a haystack in milliseconds
   – Model computations usually would take years to
     compute now finishes in minutes

                                                            10
Example: Google Data Centers
Estimated 500K commodity servers   Data centers near Columbia River
                                   At Dalles, Oregon




                                                                11
In use from 1999 to 2001

CUSTOMER PREFERENCE PROFILES
1:1 email case
• Weekly emails recommending 6 wines
• Inventory of 20K+ wines
• Wine.com had clean data
   – Purchase, time, product, spend
   – Wine color, varietal, body, acidity, oak, tannin, sweetness,
     complexity, price, producer, region
   – Email response
   – Self reported preferences and demographics
   – Web behavior clusters
   – No data of explicit customer rating, like Netflix
• Most customers have one or two data points

                                                                    13
Dynamic Newsletter
              Dear “First name”,

              Welcome to our Newsletter. Celebrate holidays with family and
              friends with a bottle of some wine.

              History of some wine. Tips on wine tasting. Recipes using
              wine. Health benefits of wine. Wine drinking is socially
              fashionable, culturally sophisticated, etc., etc.
                                                                               Clicks tracked and
Dynamic XML   Sincerely,                                                      Linked to purchases
  Template    Signature




                    Text blurb          Text blurb          Text blurb




                    Text blurb          Text blurb          Text blurb



                                                                                                    14
Wine direct marketing
• Goal, to lift purchase revenue
• Present wines customers more likely to buy
• A/B testing against weekly selections by
  merchandisers
• Concentrate on long time performance
   – Over many email campaigns
• Focus on most important predictor – behavior profile




                                                     15
Build similarity of all wines
• Decompose purchases into product attributes
    – Even 1 click can generate a taste profile
• When go out of stock, wine profile info still usable
• New inventory immediately mapped to existing profile
• Build an implicit and explicit profile of customer
• Add association rules, “customer bought these also
  bought… “
• For new customers, augment profile with nearest
  neighbors who had more purchases as “mentors”


                                                     16
Customer experience is key
• Recommend similar wines
  – Based on cosine distance to taste profile, price, and
    text mining on producer name, region, country
  – Shuffle among higher scored wines
  – Repeated campaigns take care of prediction errors
  – Dedup recent recommendations and purchases
  – Use decaying memory function and factor in
    seasonality
• Reinforce learning
• Use simulations to ensure quality

                                                            17
Learnings and insights
• Our 1:1 emails increased revenue up to 300%
• Out perform 40% over 2+ year period
• Purchase data most important
   – Putting money where your mouth or mouse is
• Email response data also predictive
• Self reported preferences are different
  from actions
   – Talk the talk versus walk the walk
• Aggregated web segments least useful



                                                  18
CUSTOMER PROFILES FOR PREDICTIVE
MODELS
Email retention models
• New email subscribers, 40% never return
  – High “infant” mortality rate
  – Activity immediately after sign ups correlate with
    normal retention
  – Frequent page views in certain pages, such as
    Help and Junk folders predictive
  – Find actionable retention drivers, such as send
    welcome emails, improve customer service, user
    experience, etc.

                                                         20
Online edu retention models
• Students have low persistence rate until after
  several courses
  – Depend on major, credits finished, demo, socio-
    economic status, first generation students
  – Also by lead source, lead form entries, etc.
• We set up to track data include search,
  display, landing page, home site, call center,
  enrollment, class finishes,…, 360 degree view
• Billions of events per month
                                                      21
Lead conversion models
• From impression to sign up as a lead is just 1/3 of
  student life cycle
• Leads have very low enrollment rates
   – Takes 3 to 6 months to enroll
   – Leads easy to convert may also be easier to drop out
• Need student performance data over long time to
  assess
   – Trade off between statistics and relevance
   – Use life time values, brand values, cost of service to
     determine media allocation

                                                              22
Display ad conversion models
• Advertisers have different conversion drivers
   – Publisher, channel, geo, behavior, demographic data,
     data append, session depth, etc.
   – Require an array of predictive models on conversion
     to work together with an auction engine
• Billions of display ads
   – Individual and event information
• Too many models, too little time to build by
  humans

                                                            23
Unexpected data challenges
• Task: predict enrollment and revenue in future
• Problem: more than one definitions of metrics
   – Made by past business analysts, using reasonable
     business rules
• Some rules are built into a BI reporting product
   – FP&A watches them every month as “truth” they
     monitor and guide the street
• With IT/BI turnovers, rules change over time
   – Few current people knew or can articulate the rules

                                                           24
Solution with data issues
• Without the rules, cannot calculate their version of
  enrollment and revenue from student financial
  transaction data
• After several meetings, still no correct rules
• We then modeled time series of reported data
   – One time data errors diluted
   – Rules changes long ago also less weighted
• We were able to predict customer and revenue for 3
  to 6 months

                                                         25
Data most important
• In modeling, find key data most important
  – Identify the smoking gun
• Data transformations
  – PageRank is a game changing data transformation
  – Wine.com case, wineRank
  – Social graph is a key data transformation for credit
    card fraud detection



                                                      26
Modeling can go wrong
• Leakage in lead scoring model
  – For example, use lead source to predict
    conversion, when certain values of the field were
    populated only for converters
• Display ads conversion model
  – Construct data set by taking all converters and a
    sample of non-converters
  – Predict conversion using page view profiles, etc.
  – Problem: sample of non-converters included
    customers who had no impressions of the ad
                                                        27
Modeling lessons
• Yahoo DSL subscribers with one year contract
• If you try to model month to month retention, you
  find high retention rate
   – Due to contracts and penalties
• The correct way is to model retention at contract
  expiry, only on 1/12 of the customers

• For Yahoo email, if you look at quarter by quarter
  retention, you find that those acquired early in the
  first quarter have lower retention rate
   – Because those customers have more time to churn
• A correct way is to use survival analysis
                                                         28
Ask.com SEM Analytics
Ask.com background

• Founded ~16 years ago
• Ask.com attracts 100 million global users
   – Biggest Q&A site on the web
• Over last 2 years we’ve revamped our approach to Q&A with a
  product that
   – Combines search technology with answers from real people
• Instead of 10 blue links, we deliver
   – Real answers to people’s questions – both from already published data
     sources
   – And our growing community of users – on the web and across mobile



                                                                        30
Ask SEM Analytics Systems

• Select quality keywords at Big Data scale
• Determine bids using search engine and internal data
• Keyword segmentation and clustering at big data level
   – Text mining, behavioral association, historic performance
   – Use of data from organic traffic
   – Map similarity of keywords
• Optimize landing page and custom creatives
• Reinforce learning, testing hypotheses
• Optimize algorithms and parameters via A/B tests


                                                                 31
SEM Bid Algorithms

• Building models for revenue at keyword level,
  predictive modeling using data include
   –   User search streams
   –   Ad depth, Landing Page CTR
   –   Quality Score and minCPC
   –   Effective CPC
   –   Keyword categories
   –   Natural language clusters
   –   Search behavioral clusters
• Use Hadoop/Hive/Mahout to process data

                                                  32
Benefits of SEM Algorithms

•   Predict keyword performance
•   Bid the right keyword at the right price, at the right time
•   Improve ROI, maximize profitable traffic volume
•   Shift traffic to keywords with higher quality scores
•   Optimize user experience
•   Find similar keywords for management and expansion




                                                                  33
Segmenting keywords

• In order to manage a large portfolio
• We group keywords together based on
  – Customer behavior
  – Text mining
  – Keyword performance metrics
• Generate keyword groups for content and bid
  management
• Similar keywords have similar performance
• Leverage learnings to other keywords

                                                34
Conclusions

• For optimal modeling, dive deep in domain knowledge
• Identify key data and transformations
• May require Big Data solutions to scale
• Data are not reliable until after being seriously analyzed
• Test hypotheses and optimize in real market
• Use simulations to see if changes are reasonable
• Focus on customer experience not data mining tools, model
  complexity or predictive accuracy
• Use a lot of common sense
• “The best way to get good ideas to have a lot of them”
       – Linus Pauling
                                                               35

More Related Content

Viewers also liked

Study of online vs offline consumer behavior final
Study of online vs offline consumer behavior finalStudy of online vs offline consumer behavior final
Study of online vs offline consumer behavior finalSCMHRD
 
Questionnaire for women entrepeneur
Questionnaire for women entrepeneurQuestionnaire for women entrepeneur
Questionnaire for women entrepeneurPratibha Mishra
 
Project on Women entrepreurnship
Project on Women entrepreurnshipProject on Women entrepreurnship
Project on Women entrepreurnshipshravanikarnatakam
 
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Krishna Petrochemicals
 
Consumer perception towards online shopping
Consumer perception towards online shoppingConsumer perception towards online shopping
Consumer perception towards online shoppingPritam singh
 
Consumer Perception Towards Online Grocery Stores
Consumer Perception Towards Online Grocery StoresConsumer Perception Towards Online Grocery Stores
Consumer Perception Towards Online Grocery StoresTanveer Singh Rainu
 

Viewers also liked (10)

Study of online vs offline consumer behavior final
Study of online vs offline consumer behavior finalStudy of online vs offline consumer behavior final
Study of online vs offline consumer behavior final
 
Questionnaire for women entrepeneur
Questionnaire for women entrepeneurQuestionnaire for women entrepeneur
Questionnaire for women entrepeneur
 
Knime
KnimeKnime
Knime
 
Project on Women entrepreurnship
Project on Women entrepreurnshipProject on Women entrepreurnship
Project on Women entrepreurnship
 
Questionnaire
QuestionnaireQuestionnaire
Questionnaire
 
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
 
Consumer perception towards online shopping
Consumer perception towards online shoppingConsumer perception towards online shopping
Consumer perception towards online shopping
 
Consumer Perception Towards Online Grocery Stores
Consumer Perception Towards Online Grocery StoresConsumer Perception Towards Online Grocery Stores
Consumer Perception Towards Online Grocery Stores
 
Questionnaire
QuestionnaireQuestionnaire
Questionnaire
 
Online shopping behaviour
Online shopping behaviourOnline shopping behaviour
Online shopping behaviour
 

Similar to Big data, predictive modeling and analytics in online marketing

Big Data Analysis and Business Intelligence
Big Data Analysis and Business IntelligenceBig Data Analysis and Business Intelligence
Big Data Analysis and Business IntelligenceDaqing Zhao
 
The Future of Customer Engagement - Rusty Warner
The Future of Customer Engagement - Rusty WarnerThe Future of Customer Engagement - Rusty Warner
The Future of Customer Engagement - Rusty WarnerAlterian
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comDaqing Zhao
 
Personalization and the Future of Database Marketing - Michael Stich, Bridge ...
Personalization and the Future of Database Marketing - Michael Stich, Bridge ...Personalization and the Future of Database Marketing - Michael Stich, Bridge ...
Personalization and the Future of Database Marketing - Michael Stich, Bridge ...Michael Stich
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsVivastream
 
Everything Data in Business, Government & Institutional Markets
Everything Data in Business, Government & Institutional MarketsEverything Data in Business, Government & Institutional Markets
Everything Data in Business, Government & Institutional MarketsVivastream
 
Graphs in the Real World
Graphs in the Real WorldGraphs in the Real World
Graphs in the Real WorldNeo4j
 
Primark Presentation
Primark PresentationPrimark Presentation
Primark Presentationmbfra85
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Vivastream
 
Personalisation to improve customer experience
Personalisation to improve customer experiencePersonalisation to improve customer experience
Personalisation to improve customer experienceEpiserver
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Vivastream
 
Silverpop: Taking Loyalty Beyond the Discount
Silverpop: Taking Loyalty Beyond the DiscountSilverpop: Taking Loyalty Beyond the Discount
Silverpop: Taking Loyalty Beyond the DiscountSilverpop
 
Moving Website Visitors to Customers - Create a Profitable Customer Profile
Moving Website Visitors to Customers - Create a Profitable Customer ProfileMoving Website Visitors to Customers - Create a Profitable Customer Profile
Moving Website Visitors to Customers - Create a Profitable Customer Profilecathylums
 
20 Lead Optimization Ideas in 20 Minutes
20 Lead Optimization Ideas in 20 Minutes20 Lead Optimization Ideas in 20 Minutes
20 Lead Optimization Ideas in 20 MinutesLeadiD
 
Data-driven marketing - expert panel
Data-driven marketing - expert panelData-driven marketing - expert panel
Data-driven marketing - expert panelCloudera, Inc.
 
2013 Email Experience Council Annual Conference
2013 Email Experience Council Annual Conference2013 Email Experience Council Annual Conference
2013 Email Experience Council Annual ConferenceRyan Phelan
 
Creating Award-winning Integrated Marketing Campaigns
Creating Award-winning Integrated Marketing CampaignsCreating Award-winning Integrated Marketing Campaigns
Creating Award-winning Integrated Marketing CampaignsDave Rosendahl
 
Data Science and Machine Learning for eCommerce and Retail
Data Science and Machine Learning for eCommerce and RetailData Science and Machine Learning for eCommerce and Retail
Data Science and Machine Learning for eCommerce and RetailAndrei Lopatenko
 

Similar to Big data, predictive modeling and analytics in online marketing (20)

Big Data Analysis and Business Intelligence
Big Data Analysis and Business IntelligenceBig Data Analysis and Business Intelligence
Big Data Analysis and Business Intelligence
 
The Future of Customer Engagement - Rusty Warner
The Future of Customer Engagement - Rusty WarnerThe Future of Customer Engagement - Rusty Warner
The Future of Customer Engagement - Rusty Warner
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
 
Personalization and the Future of Database Marketing - Michael Stich, Bridge ...
Personalization and the Future of Database Marketing - Michael Stich, Bridge ...Personalization and the Future of Database Marketing - Michael Stich, Bridge ...
Personalization and the Future of Database Marketing - Michael Stich, Bridge ...
 
1000 track3 Zhao
1000 track3 Zhao1000 track3 Zhao
1000 track3 Zhao
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Everything Data in Business, Government & Institutional Markets
Everything Data in Business, Government & Institutional MarketsEverything Data in Business, Government & Institutional Markets
Everything Data in Business, Government & Institutional Markets
 
Graphs in the Real World
Graphs in the Real WorldGraphs in the Real World
Graphs in the Real World
 
Primark Presentation
Primark PresentationPrimark Presentation
Primark Presentation
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
 
Personalisation to improve customer experience
Personalisation to improve customer experiencePersonalisation to improve customer experience
Personalisation to improve customer experience
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
 
Silverpop: Taking Loyalty Beyond the Discount
Silverpop: Taking Loyalty Beyond the DiscountSilverpop: Taking Loyalty Beyond the Discount
Silverpop: Taking Loyalty Beyond the Discount
 
5 Marketing Must Haves
5 Marketing Must Haves5 Marketing Must Haves
5 Marketing Must Haves
 
Moving Website Visitors to Customers - Create a Profitable Customer Profile
Moving Website Visitors to Customers - Create a Profitable Customer ProfileMoving Website Visitors to Customers - Create a Profitable Customer Profile
Moving Website Visitors to Customers - Create a Profitable Customer Profile
 
20 Lead Optimization Ideas in 20 Minutes
20 Lead Optimization Ideas in 20 Minutes20 Lead Optimization Ideas in 20 Minutes
20 Lead Optimization Ideas in 20 Minutes
 
Data-driven marketing - expert panel
Data-driven marketing - expert panelData-driven marketing - expert panel
Data-driven marketing - expert panel
 
2013 Email Experience Council Annual Conference
2013 Email Experience Council Annual Conference2013 Email Experience Council Annual Conference
2013 Email Experience Council Annual Conference
 
Creating Award-winning Integrated Marketing Campaigns
Creating Award-winning Integrated Marketing CampaignsCreating Award-winning Integrated Marketing Campaigns
Creating Award-winning Integrated Marketing Campaigns
 
Data Science and Machine Learning for eCommerce and Retail
Data Science and Machine Learning for eCommerce and RetailData Science and Machine Learning for eCommerce and Retail
Data Science and Machine Learning for eCommerce and Retail
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Big data, predictive modeling and analytics in online marketing

  • 1. Big Data in Online Marketing Predictive Analytics Innovation Summit Daqing Zhao, PhD Director SEM Analytics at Ask.com 2/24/2012, San Diego ©Daqing Zhao All rights reserved
  • 2. Agenda • Overview of big data analytics • Insights of big data modeling • A case for preference profiles – Recommender for a wine seller • Cases for behavioral profiles for predictive models – Yahoo mail retention – Tribal Fusion display ads impression optimization – University of Phoenix student retention – University of Phoenix lead optimization • Case of Ask.com SEM algorithms 2
  • 3. Daqing Zhao, PhD • Big Data scientist with deep domain knowledge • Academic training – Analyzed molecular spectra on Cray supercomputers – Determined, modeled, simulated molecular motions in 3D space • Enjoy working with large data and large scale computing • At Bank of America, led the development of a risk management system of global portfolio • Worked on computational Internet marketing since 1999 3
  • 4. Big data, Big Opportunities • Thanks to Moore’s law, on CPU, storage, network connections • Too much data, too little knowledge • Data, analytics changed every field • From science, government, to commerce 4
  • 5. Things computers good at • Computers have perfect memory – Every page view, click, transaction, every event,… • Good at finding a needle in a haystack – Identify clickers of any particular web page at some time – E.g., target abandoned shopping carts with promotions • Good at trade offs among a large number of factors – Female, 25-34, with child < 5, Asian, earning $30K, rent, divorced, live in Calif., some college, Walmart, visits Coupons.com, Monster.com, drive Camry, … – Buyer of X or not? 5
  • 6. Computers make it possible • Given data, optimize models and parameters – Identify reproducible patterns in the data – Provide a simple picture, predict events in the future • Simulations generate future events, given assumptions, and current state – Given a set of models, how future scenario will look like, under given set of conditions, “what ifs” – Like flight simulator • Crowd sourcing from big data and big data modeling – Define similarity, translations, quality, relevance 6
  • 7. Computers can’t do everything • Data often have issues before being well analyzed • Data often have no taxonomy and context • Free format data, relevant information need to be extracted • Computers don’t define targets, construct predictors • Don’t know if critical predictive factors are missing • Computers don’t have common sense • Computers don’t have goals to achieve 7
  • 8. Modeling need to scale • Traditional predictive models take long time to build – Small data sets, samples expensive to collect • Now data are cheap and models may degrade in weeks – Dimension of predictors are very large – Number of categories are large • Human interactive model building not scalable • Reasons for target events are complex • Without detailed analysis, it is unclear what drives the event • We need to rely on “out of sample testing” and “off the shelf” modeling 8
  • 9. Big Data problem • Data size larger than what databases can handle • Terabytes of data may take hours just to scan it • A solution requires a cloud of servers with local storage – Read, process and write intermediate results in parallel – Aggregate at the end • Cloud computing build models in scale • Cloud often scales linearly as number of servers 9
  • 10. Cloud computing • We built a SAS cloud at University of Phoenix – I have an invited SAS talk available at SAS web site – We can process billions of impressions in minutes • Hadoop clouds are used widely – Open source software – Commodity servers and storage • Clouds may have 100Ks of servers – Find needle in a haystack in milliseconds – Model computations usually would take years to compute now finishes in minutes 10
  • 11. Example: Google Data Centers Estimated 500K commodity servers Data centers near Columbia River At Dalles, Oregon 11
  • 12. In use from 1999 to 2001 CUSTOMER PREFERENCE PROFILES
  • 13. 1:1 email case • Weekly emails recommending 6 wines • Inventory of 20K+ wines • Wine.com had clean data – Purchase, time, product, spend – Wine color, varietal, body, acidity, oak, tannin, sweetness, complexity, price, producer, region – Email response – Self reported preferences and demographics – Web behavior clusters – No data of explicit customer rating, like Netflix • Most customers have one or two data points 13
  • 14. Dynamic Newsletter Dear “First name”, Welcome to our Newsletter. Celebrate holidays with family and friends with a bottle of some wine. History of some wine. Tips on wine tasting. Recipes using wine. Health benefits of wine. Wine drinking is socially fashionable, culturally sophisticated, etc., etc. Clicks tracked and Dynamic XML Sincerely, Linked to purchases Template Signature Text blurb Text blurb Text blurb Text blurb Text blurb Text blurb 14
  • 15. Wine direct marketing • Goal, to lift purchase revenue • Present wines customers more likely to buy • A/B testing against weekly selections by merchandisers • Concentrate on long time performance – Over many email campaigns • Focus on most important predictor – behavior profile 15
  • 16. Build similarity of all wines • Decompose purchases into product attributes – Even 1 click can generate a taste profile • When go out of stock, wine profile info still usable • New inventory immediately mapped to existing profile • Build an implicit and explicit profile of customer • Add association rules, “customer bought these also bought… “ • For new customers, augment profile with nearest neighbors who had more purchases as “mentors” 16
  • 17. Customer experience is key • Recommend similar wines – Based on cosine distance to taste profile, price, and text mining on producer name, region, country – Shuffle among higher scored wines – Repeated campaigns take care of prediction errors – Dedup recent recommendations and purchases – Use decaying memory function and factor in seasonality • Reinforce learning • Use simulations to ensure quality 17
  • 18. Learnings and insights • Our 1:1 emails increased revenue up to 300% • Out perform 40% over 2+ year period • Purchase data most important – Putting money where your mouth or mouse is • Email response data also predictive • Self reported preferences are different from actions – Talk the talk versus walk the walk • Aggregated web segments least useful 18
  • 19. CUSTOMER PROFILES FOR PREDICTIVE MODELS
  • 20. Email retention models • New email subscribers, 40% never return – High “infant” mortality rate – Activity immediately after sign ups correlate with normal retention – Frequent page views in certain pages, such as Help and Junk folders predictive – Find actionable retention drivers, such as send welcome emails, improve customer service, user experience, etc. 20
  • 21. Online edu retention models • Students have low persistence rate until after several courses – Depend on major, credits finished, demo, socio- economic status, first generation students – Also by lead source, lead form entries, etc. • We set up to track data include search, display, landing page, home site, call center, enrollment, class finishes,…, 360 degree view • Billions of events per month 21
  • 22. Lead conversion models • From impression to sign up as a lead is just 1/3 of student life cycle • Leads have very low enrollment rates – Takes 3 to 6 months to enroll – Leads easy to convert may also be easier to drop out • Need student performance data over long time to assess – Trade off between statistics and relevance – Use life time values, brand values, cost of service to determine media allocation 22
  • 23. Display ad conversion models • Advertisers have different conversion drivers – Publisher, channel, geo, behavior, demographic data, data append, session depth, etc. – Require an array of predictive models on conversion to work together with an auction engine • Billions of display ads – Individual and event information • Too many models, too little time to build by humans 23
  • 24. Unexpected data challenges • Task: predict enrollment and revenue in future • Problem: more than one definitions of metrics – Made by past business analysts, using reasonable business rules • Some rules are built into a BI reporting product – FP&A watches them every month as “truth” they monitor and guide the street • With IT/BI turnovers, rules change over time – Few current people knew or can articulate the rules 24
  • 25. Solution with data issues • Without the rules, cannot calculate their version of enrollment and revenue from student financial transaction data • After several meetings, still no correct rules • We then modeled time series of reported data – One time data errors diluted – Rules changes long ago also less weighted • We were able to predict customer and revenue for 3 to 6 months 25
  • 26. Data most important • In modeling, find key data most important – Identify the smoking gun • Data transformations – PageRank is a game changing data transformation – Wine.com case, wineRank – Social graph is a key data transformation for credit card fraud detection 26
  • 27. Modeling can go wrong • Leakage in lead scoring model – For example, use lead source to predict conversion, when certain values of the field were populated only for converters • Display ads conversion model – Construct data set by taking all converters and a sample of non-converters – Predict conversion using page view profiles, etc. – Problem: sample of non-converters included customers who had no impressions of the ad 27
  • 28. Modeling lessons • Yahoo DSL subscribers with one year contract • If you try to model month to month retention, you find high retention rate – Due to contracts and penalties • The correct way is to model retention at contract expiry, only on 1/12 of the customers • For Yahoo email, if you look at quarter by quarter retention, you find that those acquired early in the first quarter have lower retention rate – Because those customers have more time to churn • A correct way is to use survival analysis 28
  • 30. Ask.com background • Founded ~16 years ago • Ask.com attracts 100 million global users – Biggest Q&A site on the web • Over last 2 years we’ve revamped our approach to Q&A with a product that – Combines search technology with answers from real people • Instead of 10 blue links, we deliver – Real answers to people’s questions – both from already published data sources – And our growing community of users – on the web and across mobile 30
  • 31. Ask SEM Analytics Systems • Select quality keywords at Big Data scale • Determine bids using search engine and internal data • Keyword segmentation and clustering at big data level – Text mining, behavioral association, historic performance – Use of data from organic traffic – Map similarity of keywords • Optimize landing page and custom creatives • Reinforce learning, testing hypotheses • Optimize algorithms and parameters via A/B tests 31
  • 32. SEM Bid Algorithms • Building models for revenue at keyword level, predictive modeling using data include – User search streams – Ad depth, Landing Page CTR – Quality Score and minCPC – Effective CPC – Keyword categories – Natural language clusters – Search behavioral clusters • Use Hadoop/Hive/Mahout to process data 32
  • 33. Benefits of SEM Algorithms • Predict keyword performance • Bid the right keyword at the right price, at the right time • Improve ROI, maximize profitable traffic volume • Shift traffic to keywords with higher quality scores • Optimize user experience • Find similar keywords for management and expansion 33
  • 34. Segmenting keywords • In order to manage a large portfolio • We group keywords together based on – Customer behavior – Text mining – Keyword performance metrics • Generate keyword groups for content and bid management • Similar keywords have similar performance • Leverage learnings to other keywords 34
  • 35. Conclusions • For optimal modeling, dive deep in domain knowledge • Identify key data and transformations • May require Big Data solutions to scale • Data are not reliable until after being seriously analyzed • Test hypotheses and optimize in real market • Use simulations to see if changes are reasonable • Focus on customer experience not data mining tools, model complexity or predictive accuracy • Use a lot of common sense • “The best way to get good ideas to have a lot of them” – Linus Pauling 35