SlideShare a Scribd company logo
1 of 31
Download to read offline
Pragmatic Deep Learning for image labelling
An application to a travel recommendation engine
Introduction and Context
Iterative building of a
recommender system
Labeling Images
Pragmatic deep learning for
dummies
Post Processing
AKA: Image for BI on steroids
Outline
Results
More images !
Dataiku
•  Founded in 2013
•  90 + employees, 100 + clients
•  Paris, New-York, London, San Francisco,
Singapore
Data Science Software Editor of Dataiku DSS
DESIGN
Load and prepare
your data
PREPARE
Build your
models
MODEL
Visualize and share
your work
ANALYSE
Re-execute your
workflow at
ease
AUTOMATE
Follow your production
environment
MONITOR
Get predictions
in real time
SCORE
PRODUCTIO
N
E-business vacation retailer
Negotiate the best prize for their clients
Discount luxury
Key Figures
Sale Image is paramount
Purchase is impulsive
18 Millions of clients.
Hundreds of sales opened everyday
Specificities
Highly temporary sales
-> Classical recommender system fail
-> Time event linked (Christmas, ski, summer)
Expensive Product
-> Few recurrent buyers
-> Appearance counts a lot
Iterative Building of a Recommender System
Basic Recommendation Engines
Other Factors
One Meta Model to Rule Them All
Recommenders	
  as	
  features	
  
Machine	
  learning	
  to	
  op5mize	
  
purchasing	
  probability	
  
Combine	
  
Recommend	
  
Describe	
  
Cleaning, combining
and enrichment of
data
Recommendation
Engines
Optimization of home
display
the application
automatically runs and
compiles heterogeneous
data
Generation of
recommendations based
on user behaviour
Every customer is shown the 10 sales
he is the most likely to buy
Customer visits
Purchases
Sales Images
Metal model combine
recommendations to
directly optimize
purchasing probability
Meta Model
Recommender system for Home Page Ordering
+7% revenue
Sales information
(A/B testing)
Batch Scoring every night
Why use Image ?
We want do distinguish
« Sun and
Beach »
« Ski »
A picture is worth a thousand words
Sales Images
Integrating Image Information
Labeling Model
Pool + Palm Trees Hotel
+ Mountains
Pool + Forest + Hotel + Sea
Sea + Beach +Forest + Hotel
Sales descriptions
vector
CONTENT	
  BASED	
  
Recommender
System
Image Labelling For Recommendation Engine
Pragma&c	
  Deep	
  learning	
  for	
  “Dummies”	
  
Using Deep Learning models
Common Issues
“I don’t have GPUs server” “I don’t have a deep leaning expert”
“I don’t have labelled data” (or too few) “I don’t have the time to wait for model training ”
I don’t want to pay to pay for private apis” / “I’m afraid their labelling will change over time”
“I don’t have (or few) labelled data”
-> Is there similar data ?
Solution 1 : Pre trained models
PLACES	
  DATABASE	
  US	
   SUN	
  DATABASE	
  
205	
  categories	
  
2.5	
  M	
  images	
  
307	
  categories	
  
110	
  K	
  images	
  
tower: 0.53
skyscraper: 0.26
swimming_pool/outdoor: 0.65
inn/outdoor: 0.06
Solution 1 : Pre trained models
If there is open data, there is an open pre trained model !
•  Kudos to the community
•  Check the licensing
Example	
  with	
  Places	
  (Caffe	
  Model	
  Zoo)	
  :	
  
	
  
Solution 2 : Transfer Learning
Credit	
  :	
  	
  Fei-­‐Fei	
  Li	
  &	
  Andrej	
  Karpathy	
  &	
  Jus5n	
  Johnson	
  hYp://cs231n.stanford.edu/slides/winter1516_lecture11.pdf	
  
PLACES	
  DATABASE	
   OUR	
  DATA	
  SUN	
  DATABASE	
  
Training	
  
(op5onal)	
  
Pre-­‐trained	
  model	
  
VGG16	
  
tower: 0.53
skyscraper: 0.26
Re-­‐Training	
  
Transferred	
  Data	
  :	
  
Last	
  convolu5onal	
  
layer	
  features	
  
Re-­‐trained	
  model	
  
TensorFlow	
  
2	
  fully	
  connected	
  layers	
  
Caffe	
  
Model	
  Zoo	
  
	
  
GPU	
  
CPU	
  
GPU	
  
Leverage existing knowledge !
Solution 2 : Transfer Learning
Accuracy:	
  72%,	
  Top-­‐5	
  Acc:	
  90	
  %	
  >	
  state	
  of	
  the	
  art	
  on	
  dataset	
  alone	
  
Post Treatment & Results
(Or how we transfer the labelling
information)
Using	
  Images	
  informa&on	
  for	
  BI	
  on	
  steroids	
  	
  
Labels post-processing
Complementary information Redondant information
Issue with our approach:
Solution : NMF Matrix Factorization
Dimension	
  
Reduc5on	
   Explicability	
  Sparsity	
   Balancedness	
  
Image content detection
Topic scores determine the importance of topics in an image
TOPIC	
   TOPIC	
  SCORE	
  (%)	
  
Golf	
  course	
  –	
  Fairway	
  –	
  PuHng	
  green	
   31	
  
Hotel	
  –	
  Inn	
  –	
  Apartment	
  building	
  outdoor	
   30	
  
Swimming	
  pool	
  –	
  Lido	
  Deck	
  –	
  Hot	
  tub	
  
outdoor	
  
22	
  
Beach	
  –	
  Coast	
  -­‐	
  Harbor	
   17	
  
TOPIC	
   TOPIC	
  SCORE	
  (%)	
  
Tower	
  –	
  Skyscraper	
  –	
  Office	
  building	
   62	
  
Bridge	
  –	
  River	
  –	
  Viaduct	
   38	
  
Results ?
1) Visits :
•  France and Morocco
•  Pool displayed
2) First Recommendation
•  Mostly France & Mediterranean
•  Fails to display pools
3) Only Images recommendation
•  Pool all around the world
•  Does not respect budget
4) Third column = Right Mix
1) 2) 3) 4)
Conclusion
Do iterative data science !
Start simple and grow
Evaluate at each steps
Image labelling = BI on steroids
Transfer Learning
Kick-start your project
Gain time and money
Any Data Scientist can do it
Deep Learning
Don’t start from scratch !
Is there existing data ?
Is there a pre-trained model ?
Learned along the way
What’s next ?
AYrac5veness	
  =	
  %	
  visits	
  with	
  tag	
  /	
  %	
  sales	
  with	
  tag	
  	
  
For	
  ski	
  sales,	
  indoor	
  pictures	
  performs	
  beYer	
  
	
  
What’s Next ?
Kenya
Prague
Berlin
Cambodia
What’s Next ? Customize the Image !
Kenya
Prague
Berlin
Cambodia
Thank you for your attention !
Solution 3 : What about APIs ?
What about APIs ? Use for generating labels !
How to steal model:
•  1) Score part of the database for training
•  2) Train a model
•  3) Score your entire database !
(Or don’t, it’s illegal)
But I have only 5000 requests ?
-> Use Transfer Learning !
What about APIs ? Use for generating labels !
Experiment:
•  5000 requests on API
-> 4500 for training , 500 for validation
-> 180 class to predict
•  Transfer learning with MIT Places Pre-trained Model
•  Scikit learn Multilabel model
•  One Vs the Rest
•  Untuned Logistic regression
(demo, not used in any real project)
(Or don’t, it’s illegal)
What about APIs ? Results
Accuracy	
   95	
  
Recall	
   80	
  
Precision	
   75	
  
Label	
   Probability	
   Label	
   Probability	
  
landscape 1,0000 sunset 0,9998
sky 1,0000 no person 0,9996
outdoors 1,0000 water 0,9990
nature 1,0000 park 0,9849
rock 1,0000 river 0,9678
travel 1,0000 scenic 0,8031
Label	
   Probability	
   Label	
   Probability	
  
beach 1,0000 ocean 1,0000
summer 1,0000 relaxation 1,0000
sand 1,0000 island 1,0000
tropical 1,0000 idyllic 1,0000
travel 1,0000 seashore 0,9998
seascape 1,0000 water 0,9997(demo, not used in any real project)

More Related Content

Similar to Pragmatic Deep Learning for travel image labelling

Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...PAPIs.io
 
Ria Sankar on Building AI Products
Ria Sankar on Building AI ProductsRia Sankar on Building AI Products
Ria Sankar on Building AI ProductsRia Sankar
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneIvo Andreev
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Turi, Inc.
 
Machine Learning: How small businesses can enter the race
Machine Learning: How small businesses can enter the raceMachine Learning: How small businesses can enter the race
Machine Learning: How small businesses can enter the raceScaleway
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowDatabricks
 
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...DevClub_lv
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamDoug Needham
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science ChallengeMark Nichols, P.E.
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
Data quality is more important than you think
Data quality is more important than you thinkData quality is more important than you think
Data quality is more important than you thinkAmine Bendahmane
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptxgdgsurrey
 
Machine learning quality for production
Machine learning quality for productionMachine learning quality for production
Machine learning quality for productionyusuke shibui
 
Easy path to machine learning (Spring 2021)
Easy path to machine learning (Spring 2021)Easy path to machine learning (Spring 2021)
Easy path to machine learning (Spring 2021)wesley chun
 
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...Edge AI and Vision Alliance
 

Similar to Pragmatic Deep Learning for travel image labelling (20)

Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...
 
Ria Sankar on Building AI Products
Ria Sankar on Building AI ProductsRia Sankar on Building AI Products
Ria Sankar on Building AI Products
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
 
Machine Learning: How small businesses can enter the race
Machine Learning: How small businesses can enter the raceMachine Learning: How small businesses can enter the race
Machine Learning: How small businesses can enter the race
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflow
 
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Data quality is more important than you think
Data quality is more important than you thinkData quality is more important than you think
Data quality is more important than you think
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
 
Machine learning quality for production
Machine learning quality for productionMachine learning quality for production
Machine learning quality for production
 
Easy path to machine learning (Spring 2021)
Easy path to machine learning (Spring 2021)Easy path to machine learning (Spring 2021)
Easy path to machine learning (Spring 2021)
 
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
 

Recently uploaded

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 

Recently uploaded (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 

Pragmatic Deep Learning for travel image labelling

  • 1. Pragmatic Deep Learning for image labelling An application to a travel recommendation engine
  • 2. Introduction and Context Iterative building of a recommender system Labeling Images Pragmatic deep learning for dummies Post Processing AKA: Image for BI on steroids Outline Results More images !
  • 3. Dataiku •  Founded in 2013 •  90 + employees, 100 + clients •  Paris, New-York, London, San Francisco, Singapore Data Science Software Editor of Dataiku DSS DESIGN Load and prepare your data PREPARE Build your models MODEL Visualize and share your work ANALYSE Re-execute your workflow at ease AUTOMATE Follow your production environment MONITOR Get predictions in real time SCORE PRODUCTIO N
  • 4. E-business vacation retailer Negotiate the best prize for their clients Discount luxury Key Figures Sale Image is paramount Purchase is impulsive 18 Millions of clients. Hundreds of sales opened everyday
  • 5. Specificities Highly temporary sales -> Classical recommender system fail -> Time event linked (Christmas, ski, summer) Expensive Product -> Few recurrent buyers -> Appearance counts a lot
  • 6. Iterative Building of a Recommender System
  • 9. One Meta Model to Rule Them All Recommenders  as  features   Machine  learning  to  op5mize   purchasing  probability   Combine   Recommend   Describe  
  • 10. Cleaning, combining and enrichment of data Recommendation Engines Optimization of home display the application automatically runs and compiles heterogeneous data Generation of recommendations based on user behaviour Every customer is shown the 10 sales he is the most likely to buy Customer visits Purchases Sales Images Metal model combine recommendations to directly optimize purchasing probability Meta Model Recommender system for Home Page Ordering +7% revenue Sales information (A/B testing) Batch Scoring every night
  • 11. Why use Image ? We want do distinguish « Sun and Beach » « Ski » A picture is worth a thousand words
  • 12. Sales Images Integrating Image Information Labeling Model Pool + Palm Trees Hotel + Mountains Pool + Forest + Hotel + Sea Sea + Beach +Forest + Hotel Sales descriptions vector CONTENT  BASED   Recommender System
  • 13. Image Labelling For Recommendation Engine Pragma&c  Deep  learning  for  “Dummies”  
  • 14. Using Deep Learning models Common Issues “I don’t have GPUs server” “I don’t have a deep leaning expert” “I don’t have labelled data” (or too few) “I don’t have the time to wait for model training ” I don’t want to pay to pay for private apis” / “I’m afraid their labelling will change over time”
  • 15. “I don’t have (or few) labelled data” -> Is there similar data ? Solution 1 : Pre trained models PLACES  DATABASE  US   SUN  DATABASE   205  categories   2.5  M  images   307  categories   110  K  images  
  • 16. tower: 0.53 skyscraper: 0.26 swimming_pool/outdoor: 0.65 inn/outdoor: 0.06 Solution 1 : Pre trained models If there is open data, there is an open pre trained model ! •  Kudos to the community •  Check the licensing Example  with  Places  (Caffe  Model  Zoo)  :    
  • 17. Solution 2 : Transfer Learning Credit  :    Fei-­‐Fei  Li  &  Andrej  Karpathy  &  Jus5n  Johnson  hYp://cs231n.stanford.edu/slides/winter1516_lecture11.pdf  
  • 18. PLACES  DATABASE   OUR  DATA  SUN  DATABASE   Training   (op5onal)   Pre-­‐trained  model   VGG16   tower: 0.53 skyscraper: 0.26 Re-­‐Training   Transferred  Data  :   Last  convolu5onal   layer  features   Re-­‐trained  model   TensorFlow   2  fully  connected  layers   Caffe   Model  Zoo     GPU   CPU   GPU   Leverage existing knowledge ! Solution 2 : Transfer Learning Accuracy:  72%,  Top-­‐5  Acc:  90  %  >  state  of  the  art  on  dataset  alone  
  • 19. Post Treatment & Results (Or how we transfer the labelling information) Using  Images  informa&on  for  BI  on  steroids    
  • 20. Labels post-processing Complementary information Redondant information Issue with our approach: Solution : NMF Matrix Factorization Dimension   Reduc5on   Explicability  Sparsity   Balancedness  
  • 21. Image content detection Topic scores determine the importance of topics in an image TOPIC   TOPIC  SCORE  (%)   Golf  course  –  Fairway  –  PuHng  green   31   Hotel  –  Inn  –  Apartment  building  outdoor   30   Swimming  pool  –  Lido  Deck  –  Hot  tub   outdoor   22   Beach  –  Coast  -­‐  Harbor   17   TOPIC   TOPIC  SCORE  (%)   Tower  –  Skyscraper  –  Office  building   62   Bridge  –  River  –  Viaduct   38  
  • 22. Results ? 1) Visits : •  France and Morocco •  Pool displayed 2) First Recommendation •  Mostly France & Mediterranean •  Fails to display pools 3) Only Images recommendation •  Pool all around the world •  Does not respect budget 4) Third column = Right Mix 1) 2) 3) 4)
  • 23. Conclusion Do iterative data science ! Start simple and grow Evaluate at each steps Image labelling = BI on steroids Transfer Learning Kick-start your project Gain time and money Any Data Scientist can do it Deep Learning Don’t start from scratch ! Is there existing data ? Is there a pre-trained model ?
  • 24. Learned along the way What’s next ? AYrac5veness  =  %  visits  with  tag  /  %  sales  with  tag     For  ski  sales,  indoor  pictures  performs  beYer    
  • 26. What’s Next ? Customize the Image ! Kenya Prague Berlin Cambodia
  • 27. Thank you for your attention !
  • 28. Solution 3 : What about APIs ?
  • 29. What about APIs ? Use for generating labels ! How to steal model: •  1) Score part of the database for training •  2) Train a model •  3) Score your entire database ! (Or don’t, it’s illegal) But I have only 5000 requests ? -> Use Transfer Learning !
  • 30. What about APIs ? Use for generating labels ! Experiment: •  5000 requests on API -> 4500 for training , 500 for validation -> 180 class to predict •  Transfer learning with MIT Places Pre-trained Model •  Scikit learn Multilabel model •  One Vs the Rest •  Untuned Logistic regression (demo, not used in any real project) (Or don’t, it’s illegal)
  • 31. What about APIs ? Results Accuracy   95   Recall   80   Precision   75   Label   Probability   Label   Probability   landscape 1,0000 sunset 0,9998 sky 1,0000 no person 0,9996 outdoors 1,0000 water 0,9990 nature 1,0000 park 0,9849 rock 1,0000 river 0,9678 travel 1,0000 scenic 0,8031 Label   Probability   Label   Probability   beach 1,0000 ocean 1,0000 summer 1,0000 relaxation 1,0000 sand 1,0000 island 1,0000 tropical 1,0000 idyllic 1,0000 travel 1,0000 seashore 0,9998 seascape 1,0000 water 0,9997(demo, not used in any real project)