SlideShare a Scribd company logo
1 of 23
Download to read offline
November 11, 2015
Macys.com Advanced Analytics
Advanced Analytics at Macys.com
Agenda
 Big data analytics and data scientists
 Challenges and solutions of big data predictive
modeling
 Macy’s Advanced Analytics Team
 Our analytics projects
 Personalized site recommendations
 Response propensity models
 Best practices of analysts and modeling
2
 DATA scientist” for Big Data infrastructure
Data collections, data processing, pattern recognition,
data access, data presentations
Infrastructure and tool builders
 “data SCIENTIST” for Big Data domain problems
Bid data domain solutions, data driven insights,
what, how and why
The science and art of use of big data and tools
 Big Data is not about data, it is about Big Analysis and Solutions
 Division of expertise is inevitable
 Focus, domain knowledge, specialization
3
Two type of data scientists
Modeling in the Big Data era
 Challenges:
 Modeling needs to scale
 Timeliness of models
 It takes time to integrate
 Test and Experimentation
 Solutions:
 Big data warehouse solutions
 Separation of concerns
 Scalable modeling tools
 Best practices in modeling
4
Separation of concerns
• Solution complexity
• Data complexity
• Variability of requirements
• Standard data mining algorithms
• Availability
• Reliability, scalability, latency
• CPU, help and disk IO issues
5
Platform
Engineering
Data
Science
Test and Experimentation
 Customer response behavior is complex
 Theory may or may not be right
 New layouts, new models, new messages
 Split traffic tests
 Find the winners, and gain learning
 Often there are test design problems and
understanding their implications
6
It takes time to integrate
 Make sure the right data are collected
 Measurement and attribution
 Start conversations about model based
decisions
 Teams need to think in model metrics
 Organization needs to adapt
 Accumulate assets of creatives, best practices
7
Scalable modeling tools
 Out of sample testing, cross validation
 Fast and scalable modeling algorithms
 Model comparisons and selections, model management tools
 Automated model optimization tools
 Penalize models being unnecessarily too large
 Ensemble models
 Robust models, handling missing variables, and outliers
 Convenient model building environment
 Graphical tools
 Model deployment tools
8
Best practice big data modeling
 Understand how the data are collected, what data can
and cannot be collected
 Balance cost of collecting data and optimize modeling
 Model performance depends on quality of data
 Use automated, robust model building solutions
 Use feedback loop to test hypotheses
 Do simulations to see if changes are reasonable
 Good ideas are not necessarily complicated
 Focus on domain knowledge,
not just data mining tools
9
Macy.com’s Advanced Analytics
 We are at the frontiers of Big Data science
 We have predictive modeling, experimental design and data
science teams
 Our team members have very strong background in
 Quantitative fields, math, stat, physics, bioinformatics, decision
sciences, and computer science
 We collaborate with systems and IT teams internally as well as 3rd party
vendors like SAS Research, IBM Research, …
 We use a wide range of tools Hadoop, SAS, SAP, H2O, R,
Mahout, Spark, and others
 We are data scientists with keen focus on domain problems
10
Customer acquisition and retention
 Targeting the right message to the right customer at the right
time
 Build predictive models of purchase behavior and identify
drivers
 Customer retention modeling and identify drivers
 Site recommendation algorithms
 Most work is in batch mode, expanding slowly into real time
 Rapid-prototyping and testing of algorithms and policies
 Output of the team’s work support other marketing teams to
identify, and reach best customers
11
Some other projects
 Data organization or data munging
 Data collections, individual and event level, 360 degrees, …
 Segmentation of customers
 Customer value
 Multiple channel attribution
 Experimentation platform
 Both for site layout as well as contents and recommendations
 Forecast and optimization
 Prediction, simulation, and search and optimize
 Big data refinement and scalability
 Find new data sources, more efficient ways of accessing data,
and organizing and processing data
12
Macys.com’s real time site personalization
13
Customer segmentation
14
Demographic
Socio-economic
Behavioral
Values and styles
Channels
Brand
Product social network
15
Demographic
Style
Size
Brand
Price range
Season
Text mining for similarity
Augment data
Differentiation
Generalization
1
6
Who gets which email?
17
Propensity Models
18
We are building an expanding family of models, at
Category/Brand/Outfit…
If customer, by category, bought Online/Store, browsed,
A2B, Email open/clicks, gender, age, store proximity,
Personicx, recency/frequency/spent…
Predict buy
Men’s
Clothes ?
Next 4
weeksUp to two years observation
Customer behavioral models
Insights and predictive models and decisions
Popular and long tail products
Easy to model popular products
Hard to model long tail niche products
Balance trade offs between
increasing differentiation
and reducing noise
Model decisioning using
business constraints
and linear programming
H2O Proof of Concept Project
• H2O Demo: 100MM rows and 50 cols, logistic
regression on a 16 nodes cluster, finishes in 11 sec
• We provide challenging use case for H2O
• Research cluster of 15 machines
• Customer sample of 7 million, 4000+ columns,
• Test 100 models, R interface, model training in two days
• Model accuracy by ROC AUC distributions, robustness
• Score all customers in several hours
Considerations of Recommendations
• Data sources
Data completeness, timeliness
Data organization
• Define similarity
Look alike in what way
• Cross sell and up sell
• Business rules, assortments
• Repetition
Freshness vs burn into memory
• Seasonality
• New product
• Cold start
Considerations of Recommendations
• Social needs, aspirations
• Trend, fashionable, hot, cool
• Friends, family, neighbors
• Be respected, accepted
• New, impulse
• Explore, learn, serendipity
• Risk taking, find out self
• Less familiar
• Looking for guidance, willing to
invest and be influenced
• Trust, authority
• Acquisition modeling
• Hit and miss, nurture, branding
• Intrinsic needs
• Character, belief, value
• Habits, belong
• Style, long term taste, theme
• Status
• “That’s me”
• Familiar, known
• Looking for value, convenience,
deal, services
• Trust and service
• Easier to build a profile, preference
matching
Concluding thoughts
 Big data presents big opportunity and big challenges
 Data science is not about data, but domain solutions
 Modeling in Big Data era is different from traditional
practices
 Resolution, robustness, turn around time, timeliness
 Organizations need to adapt to model based decisions
 Data are not clean until thoroughly analyzed
 Scalable and efficient modeling tools are essential
23

More Related Content

What's hot

H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda IngramH2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
Sri Ambati
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino Data Lab
 
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
Thoughtworks
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
Bala Iyer
 

What's hot (20)

H2O World - ML Could Solve NLP Challenges: Ontology Management - Erik Huddleston
H2O World - ML Could Solve NLP Challenges: Ontology Management - Erik HuddlestonH2O World - ML Could Solve NLP Challenges: Ontology Management - Erik Huddleston
H2O World - ML Could Solve NLP Challenges: Ontology Management - Erik Huddleston
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...
 
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda IngramH2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
 
H2O World - Data Science in Action @ 6sense - Viral Bajaria
H2O World - Data Science in Action @ 6sense - Viral BajariaH2O World - Data Science in Action @ 6sense - Viral Bajaria
H2O World - Data Science in Action @ 6sense - Viral Bajaria
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop
 
940 sponsor gazdak_using our laptop
940 sponsor gazdak_using our laptop940 sponsor gazdak_using our laptop
940 sponsor gazdak_using our laptop
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
 
1215 dataikulunchlearn sanford
1215 dataikulunchlearn sanford1215 dataikulunchlearn sanford
1215 dataikulunchlearn sanford
 
Notilyze SAS
Notilyze SASNotilyze SAS
Notilyze SAS
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
925 plenary rexer_using our laptop
925 plenary rexer_using our laptop925 plenary rexer_using our laptop
925 plenary rexer_using our laptop
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Slides: Five Data Valuation Pillars
Slides:  Five Data Valuation PillarsSlides:  Five Data Valuation Pillars
Slides: Five Data Valuation Pillars
 
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
 
Giovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDrivenGiovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDriven
 
Evaluation of big data analysis
Evaluation of big data analysisEvaluation of big data analysis
Evaluation of big data analysis
 
900 keynote abbott
900 keynote abbott900 keynote abbott
900 keynote abbott
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
 

Viewers also liked

Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
Sri Ambati
 
Cgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsCgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analytics
brock55
 

Viewers also liked (20)

Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
 
H2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaH2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal Malohlava
 
Momentos Memoráveis da Fotografia Publicitária 1991 a 2016
Momentos Memoráveis da Fotografia Publicitária 1991 a 2016Momentos Memoráveis da Fotografia Publicitária 1991 a 2016
Momentos Memoráveis da Fotografia Publicitária 1991 a 2016
 
Glm talk Tomas
Glm talk TomasGlm talk Tomas
Glm talk Tomas
 
H2O World - Building Data Products for Data Natives - Monica Rogati
H2O World - Building Data Products for Data Natives - Monica RogatiH2O World - Building Data Products for Data Natives - Monica Rogati
H2O World - Building Data Products for Data Natives - Monica Rogati
 
Running GLM in R
Running GLM in RRunning GLM in R
Running GLM in R
 
Cgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsCgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analytics
 
Real Time Recommendation System using Kiji
Real Time Recommendation System using KijiReal Time Recommendation System using Kiji
Real Time Recommendation System using Kiji
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
 
Realtime Analytics With Elasticsearch [New Media Inspiration 2013]
Realtime Analytics With Elasticsearch [New Media Inspiration 2013]Realtime Analytics With Elasticsearch [New Media Inspiration 2013]
Realtime Analytics With Elasticsearch [New Media Inspiration 2013]
 
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
 
Real-Time Personalization
Real-Time PersonalizationReal-Time Personalization
Real-Time Personalization
 
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
 
H2O World - Generalized Low Rank Models - Madeleine Udell
H2O World - Generalized Low Rank Models - Madeleine UdellH2O World - Generalized Low Rank Models - Madeleine Udell
H2O World - Generalized Low Rank Models - Madeleine Udell
 
H2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas NykodymH2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas Nykodym
 
H2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelH2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno Candel
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
 
Intro to H2O Machine Learning in Python - Galvanize Seattle
Intro to H2O Machine Learning in Python - Galvanize SeattleIntro to H2O Machine Learning in Python - Galvanize Seattle
Intro to H2O Machine Learning in Python - Galvanize Seattle
 

Similar to H2O World - Advanced Analytics at Macys.com - Daqing Zhao

Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
Vivastream
 

Similar to H2O World - Advanced Analytics at Macys.com - Daqing Zhao (20)

1000 track3 Zhao
1000 track3 Zhao1000 track3 Zhao
1000 track3 Zhao
 
Big Data Analysis and Business Intelligence
Big Data Analysis and Business IntelligenceBig Data Analysis and Business Intelligence
Big Data Analysis and Business Intelligence
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Big data for sales and marketing people
Big data for sales and marketing peopleBig data for sales and marketing people
Big data for sales and marketing people
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data Scientist
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
What MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysWhat MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and Surveys
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
 
Understanding Cognitive Applications: A Framework - Sue Feldman
Understanding Cognitive Applications:  A Framework - Sue FeldmanUnderstanding Cognitive Applications:  A Framework - Sue Feldman
Understanding Cognitive Applications: A Framework - Sue Feldman
 
Building enterprise advance analytics platform
Building enterprise advance analytics platformBuilding enterprise advance analytics platform
Building enterprise advance analytics platform
 
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
 
BAMarathon_DanielFylstra_Feb25.pptx
BAMarathon_DanielFylstra_Feb25.pptxBAMarathon_DanielFylstra_Feb25.pptx
BAMarathon_DanielFylstra_Feb25.pptx
 
Integrating AI - Business Applications
Integrating AI - Business ApplicationsIntegrating AI - Business Applications
Integrating AI - Business Applications
 
Digital Economics
Digital EconomicsDigital Economics
Digital Economics
 
ODSC East 2018
ODSC East 2018ODSC East 2018
ODSC East 2018
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Using Powerful Questions to Drive Your Information Strategy
Using Powerful Questions to Drive Your Information StrategyUsing Powerful Questions to Drive Your Information Strategy
Using Powerful Questions to Drive Your Information Strategy
 
Giving Organisations new Capabilities to ask the Right Business Questions
Giving Organisations new Capabilities to ask the Right Business QuestionsGiving Organisations new Capabilities to ask the Right Business Questions
Giving Organisations new Capabilities to ask the Right Business Questions
 
Use of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economyUse of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economy
 
Predictive Analytics, AI and the Promise of Personalization
Predictive Analytics, AI and the Promise of PersonalizationPredictive Analytics, AI and the Promise of Personalization
Predictive Analytics, AI and the Promise of Personalization
 

More from Sri Ambati

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Recently uploaded (20)

%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 

H2O World - Advanced Analytics at Macys.com - Daqing Zhao

  • 1. November 11, 2015 Macys.com Advanced Analytics Advanced Analytics at Macys.com
  • 2. Agenda  Big data analytics and data scientists  Challenges and solutions of big data predictive modeling  Macy’s Advanced Analytics Team  Our analytics projects  Personalized site recommendations  Response propensity models  Best practices of analysts and modeling 2
  • 3.  DATA scientist” for Big Data infrastructure Data collections, data processing, pattern recognition, data access, data presentations Infrastructure and tool builders  “data SCIENTIST” for Big Data domain problems Bid data domain solutions, data driven insights, what, how and why The science and art of use of big data and tools  Big Data is not about data, it is about Big Analysis and Solutions  Division of expertise is inevitable  Focus, domain knowledge, specialization 3 Two type of data scientists
  • 4. Modeling in the Big Data era  Challenges:  Modeling needs to scale  Timeliness of models  It takes time to integrate  Test and Experimentation  Solutions:  Big data warehouse solutions  Separation of concerns  Scalable modeling tools  Best practices in modeling 4
  • 5. Separation of concerns • Solution complexity • Data complexity • Variability of requirements • Standard data mining algorithms • Availability • Reliability, scalability, latency • CPU, help and disk IO issues 5 Platform Engineering Data Science
  • 6. Test and Experimentation  Customer response behavior is complex  Theory may or may not be right  New layouts, new models, new messages  Split traffic tests  Find the winners, and gain learning  Often there are test design problems and understanding their implications 6
  • 7. It takes time to integrate  Make sure the right data are collected  Measurement and attribution  Start conversations about model based decisions  Teams need to think in model metrics  Organization needs to adapt  Accumulate assets of creatives, best practices 7
  • 8. Scalable modeling tools  Out of sample testing, cross validation  Fast and scalable modeling algorithms  Model comparisons and selections, model management tools  Automated model optimization tools  Penalize models being unnecessarily too large  Ensemble models  Robust models, handling missing variables, and outliers  Convenient model building environment  Graphical tools  Model deployment tools 8
  • 9. Best practice big data modeling  Understand how the data are collected, what data can and cannot be collected  Balance cost of collecting data and optimize modeling  Model performance depends on quality of data  Use automated, robust model building solutions  Use feedback loop to test hypotheses  Do simulations to see if changes are reasonable  Good ideas are not necessarily complicated  Focus on domain knowledge, not just data mining tools 9
  • 10. Macy.com’s Advanced Analytics  We are at the frontiers of Big Data science  We have predictive modeling, experimental design and data science teams  Our team members have very strong background in  Quantitative fields, math, stat, physics, bioinformatics, decision sciences, and computer science  We collaborate with systems and IT teams internally as well as 3rd party vendors like SAS Research, IBM Research, …  We use a wide range of tools Hadoop, SAS, SAP, H2O, R, Mahout, Spark, and others  We are data scientists with keen focus on domain problems 10
  • 11. Customer acquisition and retention  Targeting the right message to the right customer at the right time  Build predictive models of purchase behavior and identify drivers  Customer retention modeling and identify drivers  Site recommendation algorithms  Most work is in batch mode, expanding slowly into real time  Rapid-prototyping and testing of algorithms and policies  Output of the team’s work support other marketing teams to identify, and reach best customers 11
  • 12. Some other projects  Data organization or data munging  Data collections, individual and event level, 360 degrees, …  Segmentation of customers  Customer value  Multiple channel attribution  Experimentation platform  Both for site layout as well as contents and recommendations  Forecast and optimization  Prediction, simulation, and search and optimize  Big data refinement and scalability  Find new data sources, more efficient ways of accessing data, and organizing and processing data 12
  • 13. Macys.com’s real time site personalization 13
  • 16. Text mining for similarity Augment data Differentiation Generalization 1 6
  • 17. Who gets which email? 17
  • 18. Propensity Models 18 We are building an expanding family of models, at Category/Brand/Outfit… If customer, by category, bought Online/Store, browsed, A2B, Email open/clicks, gender, age, store proximity, Personicx, recency/frequency/spent… Predict buy Men’s Clothes ? Next 4 weeksUp to two years observation
  • 19. Customer behavioral models Insights and predictive models and decisions Popular and long tail products Easy to model popular products Hard to model long tail niche products Balance trade offs between increasing differentiation and reducing noise Model decisioning using business constraints and linear programming
  • 20. H2O Proof of Concept Project • H2O Demo: 100MM rows and 50 cols, logistic regression on a 16 nodes cluster, finishes in 11 sec • We provide challenging use case for H2O • Research cluster of 15 machines • Customer sample of 7 million, 4000+ columns, • Test 100 models, R interface, model training in two days • Model accuracy by ROC AUC distributions, robustness • Score all customers in several hours
  • 21. Considerations of Recommendations • Data sources Data completeness, timeliness Data organization • Define similarity Look alike in what way • Cross sell and up sell • Business rules, assortments • Repetition Freshness vs burn into memory • Seasonality • New product • Cold start
  • 22. Considerations of Recommendations • Social needs, aspirations • Trend, fashionable, hot, cool • Friends, family, neighbors • Be respected, accepted • New, impulse • Explore, learn, serendipity • Risk taking, find out self • Less familiar • Looking for guidance, willing to invest and be influenced • Trust, authority • Acquisition modeling • Hit and miss, nurture, branding • Intrinsic needs • Character, belief, value • Habits, belong • Style, long term taste, theme • Status • “That’s me” • Familiar, known • Looking for value, convenience, deal, services • Trust and service • Easier to build a profile, preference matching
  • 23. Concluding thoughts  Big data presents big opportunity and big challenges  Data science is not about data, but domain solutions  Modeling in Big Data era is different from traditional practices  Resolution, robustness, turn around time, timeliness  Organizations need to adapt to model based decisions  Data are not clean until thoroughly analyzed  Scalable and efficient modeling tools are essential 23