SlideShare a Scribd company logo
1 of 41
Download to read offline
Immoviz - #WeAreAnts
IMMOVIZ
BORDEAUX
Emeline Gaulard - Du Phan1
Immoviz - #WeAreAnts
WHO ARE WE
EMELINE GAULARD
BACKEND DEVELOPER
EPITECH ’18
DU PHAN
DATA SCIENTIST
ENSC ‘17
2
Immoviz - #WeAreAnts
WE
ARE
ANTS
WHERE DO WE WORK
Prototyping Data Science Internet of Things Fun
3
Immoviz - #WeAreAnts
Immoviz ?
4
Immoviz - #WeAreAnts 5
Immoviz - #WeAreAnts 6
Elastic Search
Python
Search
Machine Learning
ipython notebook
pandas/numpy
seaborn/folium
scikit-learn
hyperopt
Backend
NodeJs
Express
CasperJS
PostgreSQL
Slack bot (node-slackr)
TOOLBOX
Immoviz - #WeAreAnts 7
Before
INFRASTRUCTURE
Immoviz - #WeAreAnts
Now
INFRASTRUCTURE
8
Immoviz - #WeAreAnts
OUTLINE
SCRAPPERS
ELASTIC SEARCH
DUPLICATE AGGREGATION
PRICE PREDICTION
1
2
3
4
9
Immoviz - #WeAreAnts
Scrappers
CasperJS
PostgreSQL
Elastic SearchReal estate sites
(seloger, leboncoin, bienici, sudouest,…)
10
Immoviz - #WeAreAnts 11
Scrappers
Simple to use
Lightweight
Debugging is non-trivial
Immoviz - #WeAreAnts
Indexing
Mapping
Analyzing
Querying
Elastic Search
12
Immoviz - #WeAreAnts 13
Analyzer example
Immoviz - #WeAreAnts 14
Query example
Immoviz - #WeAreAnts
Text analysis
Price comparison
Elastic Search
Duplicate aggregation
ID comparison
Price comparison
PostgreSQL
15
Immoviz - #WeAreAnts
Duplicate aggregation
ID comparison
Price comparison
PostgreSQL
16
Immoviz - #WeAreAnts
Text analysis
Price comparison
Elastic Search
Duplicate aggregation
ID comparison
Price comparison
PostgreSQL
17
Immoviz - #WeAreAnts
Error analysis
Cross-validate for testing error
Locate sensitive zone
Visualize error
…
MACHINE LEARNING WORKFLOW
Data Cleaning
Check input format
Split data and hide holdout
Drop/impute null values
Filter outlier
…
Feature Engineering
Extract features
Scale/normalize data
Test contextual data
…
Data Modeling
Cross-validate for model selection
Optimize hyper-parameters
…
18
Immoviz - #WeAreAnts 19
If a data set has
affected any step in
the learning process,
its ability to assess the
outcome has been
compromised.
Data snooping
Immoviz - #WeAreAnts 20
k-fold Cross Validation
Immoviz - #WeAreAnts
Error analysis
Cross-validate for testing error
Locate sensitive zone
Visualize error
…
MACHINE LEARNING WORKFLOW
Data Cleaning
Check input format
Drop/impute null values
Filter outlier
Split data and hide holdout
…
Feature Engineering
Extract features
Scale/normalize data
Test contextual data
…
Data Modeling
Cross-validate for model selection
Optimize hyper-parameters
…
21
Immoviz - #WeAreAnts 22
Source: Professor Yaser Abu-Mostafa, Caltech
Immoviz - #WeAreAnts X
If you torture the data
long enough, it will
confess.
Data snooping
Immoviz - #WeAreAnts 23
Some key numbers
60 000 adverts, including 20 432 selling ads
12 839 unique selling ads with 61 features
10 883 selling ads remaining with 52 features after filtering
8 months of data
Immoviz - #WeAreAnts 24
Data Cleaning & EDA
Data Modeling
20%
Error Analysis
Allocation of time
10%
20%
Feature Engineering 50%
Immoviz - #WeAreAnts
Location features
Contextual data (Open
Moulinette)
Imputing Room features
Removing contextual
outliers
Improving ES queries
Feature engineering - what work ?
Time series features
NLP on text data
Dimensionality reduction
Numerical values
transforming/scaling
25
Immoviz - #WeAreAnts 26
Linear Model Tree-based model Average Ensemble
method
Metamodel
Ensemble method
Data Modeling: what algorithms to use ?
Immoviz - #WeAreAnts 27
This is how you win ML
competitions: you
take other peoples’
work and ensemble
them together.”
Vitaly Kuznetsov - NIPS2014
Immoviz - #WeAreAnts X
Meta-model ensemble method: explanation
Immoviz - #WeAreAnts 28
Kaggle Homesite winner
Source: Homesite Quote Conversion, Winners' Write-Up, 1st Place: KazAnova
Immoviz - #WeAreAnts 29
Error analysis: visualization is key
Immoviz - #WeAreAnts 30
Error analysis: visualization is key
Immoviz - #WeAreAnts
Result
Linear Regression
Lasso
Random Forest
Gradient Boosting
Average Ensemble Method
Metamodel Ensemble Method
0 6,5 13 19,5 26
10-fold CV mean error (%)
31
Immoviz - #WeAreAnts
Result
12.3% 13.1%
CV mean error Holdout mean error
32
8.8% 9.3%
CV median error Holdout median error
Immoviz - #WeAreAnts 33
Feature importance
Immoviz - #WeAreAnts 34
Immoviz - #WeAreAnts 35
How to improve the model
More data
Improve ES queries (sector, type, … )
Leverage time series data
More data
Immoviz - #WeAreAnts X
How to improve the model
Immoviz - #WeAreAnts 36
Metrics
Recommendation System
User Experience
Speed
What’s next ?
Immoviz - #WeAreAnts 37
Conclusion
Better data beats cleverer algorithm
System monitoring is vital
There needs to be a coherent data flow
between backend and ML engine
Immoviz - #WeAreAnts 38
Thank you for your attention.
Any questions ?

More Related Content

Similar to Immoviz.io - real estate search engine

Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summit
Open Analytics
 
Playing Nice in the Product Playground #StrataHadoop
Playing Nice in the Product Playground #StrataHadoopPlaying Nice in the Product Playground #StrataHadoop
Playing Nice in the Product Playground #StrataHadoop
Intuit Inc.
 

Similar to Immoviz.io - real estate search engine (20)

AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tDefcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summit
 
Adversary Driven Defense in the Real World
Adversary Driven Defense in the Real WorldAdversary Driven Defense in the Real World
Adversary Driven Defense in the Real World
 
Anything data (revisited)
Anything data (revisited)Anything data (revisited)
Anything data (revisited)
 
JobTech at PyCon 2018
JobTech at PyCon 2018JobTech at PyCon 2018
JobTech at PyCon 2018
 
scaleXT slide deck story
scaleXT slide deck storyscaleXT slide deck story
scaleXT slide deck story
 
Swagger Code Generation
Swagger Code GenerationSwagger Code Generation
Swagger Code Generation
 
Learning Machine Learning
Learning Machine LearningLearning Machine Learning
Learning Machine Learning
 
Big Data – A New Testing Challenge
Big Data – A New Testing ChallengeBig Data – A New Testing Challenge
Big Data – A New Testing Challenge
 
Webinar: RDBMS to Graphs
Webinar: RDBMS to GraphsWebinar: RDBMS to Graphs
Webinar: RDBMS to Graphs
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Playing Nice in the Product Playground #StrataHadoop
Playing Nice in the Product Playground #StrataHadoopPlaying Nice in the Product Playground #StrataHadoop
Playing Nice in the Product Playground #StrataHadoop
 
Moving Forward with AI
Moving Forward with AIMoving Forward with AI
Moving Forward with AI
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
 
Digital Platforms - Scott Shaw
Digital Platforms - Scott ShawDigital Platforms - Scott Shaw
Digital Platforms - Scott Shaw
 
The Future Based on AI and Analytics
The Future Based on AI and AnalyticsThe Future Based on AI and Analytics
The Future Based on AI and Analytics
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 

Recently uploaded (20)

ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

Immoviz.io - real estate search engine