SlideShare a Scribd company logo
1 of 32
Real-estate analytics: A Vietnam case study
Real-estate analytics: a Vietnam case study
Viet-Trung Tran
School of Communication and Information Technology
Hanoi University of Science and Technology
Outline
• Problem
• Where big data analytics can help
• Geographically weighted regression for
property appraisal
• Conclusion
2
Problem
• A national data base is needed to support investors and home
buyers.
– "After more than 20 years of establishment and development, information on
Vietnam’s real estate market Vietnam is still ranked low on transparency"
3
Where is my data?
• The good
– Property listings are almost public on the Internet
• The bad
– Thousands sites
– Semi-structured text, needed NLP
• The ugly
– Spam/Duplication
– Unreal, un-correct, low data quality
4
5
there is a boom in trading floors and many use tricks similar
to those adopted by multi-level marketing companies such
as sending messages to customers, providing misleading
information about real estate products, causing price
bubbles.
6
Trang tin ABC
Trang tin XYZ
Vietnam real-estate vs. stock market
• 300 billions USD (FPT
securities/2015)
• Lack of high quality data, tons
of scrams
• Under weak governmental
control
• No national databases
• 33 billionsUSD (quandl.com)
• Clear reports & plots, curated
data
• Strong governmental control
• Centralized, real-time
monitoring
7
Vietnam real-estate vs. things e-commerce
• High value, high ROI
• Immobile
8
• Low value, no ROI
• Mobile, disappeared over time
Vietnam property listings are advertised in the same
manner as fridges and TV
Where big data analytics can help
• Index the entire real estate market
– 8.5 millions listing to date (02/2017)
• Deliver real time market insights
– powered by machine learning and Vietnamese
language processing
9
MARKET DATA
TRANSPARENCY
for all
SAVE TIME
AVOID OVER PRICE
for buyers
Big data processing
10
Big data processing
Natural language
processing
Crawlers
QC: Filters/deduplication
Distributed Database
Report
Chatbot
Website
Vietnamese language processing
• Tasks
– Named Entity Recognition (NER)
– Vietnamese address normalization (Critical!)
11
Big data processing
• Tasks
– Price timelines for every roads, wards, districts, cities
– Automatic property appraisal
– More analytics to come
• About our data
– 8.5 millions listings (to date)
– Stored on Hbase
– Processed on Spark
12
Prototype (to date)
13
Automatic property appraisal
• Tran, Hung Tien, Hiep Tuan Nguyen, and Viet-Trung Tran. "Large-scale
geographically weighted regression on Spark." Knowledge and Systems
Engineering (KSE), 2016 Eighth International Conference on. IEEE, 2016.
14
GWR + =
- Large-scale spatial data
- Improve performance
- Distributed
First Law of Geography - Waldo Tobler:
“Everything is related with everything else, but
closer things are more related”.
Background
• First Law of Geography - Waldo Tobler:
“Everything is related with everything else, but closer
things are more related”.
• Model GWR
– The OLS estimator takes the form
yi (u) = β0i (u) + β1i (u)x1i +β2i (u)x2i + ... + βmi (u)xmi
βˆ(u) = (X TW (u)X )−1 X TW (u)Y
Background
• Kernel function
– Gaussian function
• Bandwidth
16
fixed bandwidth adaptive bandwidth
Problem
• Estimating a local model
• Bandwidth selection
– Which bandwidth is good
• Evaluation model
– Choose kernel function
βˆ(u) = (X TW (u)X )−1 X TW (u)Y
Source: http://rose.bris.ac.uk
O(n3)
Problem
• How to apply the model for large-scale
data?
– Data points
– Features
– Regression points
Large-Scale GWR on Spark
• Why is Spark?
– In-memory cluster-computing platform
– Parallel programming
– Resilient distributed datasets
Large-Scale GWR on Spark
• We propose three approach to scaling GWR
– Scaling Weighted Linear Regression
– Parallel Multiple WLR models
– Parallel Geographically Weighted Regression
(combine the first two approach)
Scalable GWR on Spark
• Naïve approach – Scaling Weighted Linear
Regression
Foreach regPoint
Compute weight
Fit Weighted
Linear Regression
Summary model
Compute weight
parallel
Compute WLR
model parallel
Scalable GWR on Spark
• Parallel Multiple WLR models
Regression dataset
Training dataset
WLR
Compute weight
WLR
Compute parallel
multiple WLR models
Summary
Scalable GWR on Spark
• Parallel Geographically Weighted Regression
R
R
R
T
T
T
RT
RT
RT
Regression
dataset
Training
dataset
Combine
dataset
Distributed GWR Computation
Experiments
• Environment
– Cluster: 8 nodes on Amazon Web Service
• 4 cores Inte Xeon E5-2670 v2 2.5 GHz
• 16 GB RAM, 2x40 GB SSD
• Hadoop 2.7.2 and Spark 1.6.1
– Dataset
| − −x : double(nullable = false)
| − −y : double(nullable = false)
| − −label : double(nullable = false)
| − −f eatures : vector(nullable = false)
Large training dataset
0
200
400
600
800
1000
1200
10000 100000 1000000 2000000 5000000
Distributed WLR
computation
Parallel WLR
Distributed GWR NE
Distributed GWR GD
time (sec).
Number of training points
Large regression dataset
0
200
400
600
800
1000
1200
1000 5000 10000 20000 50000
Distributed WLR computation
Parallel WLR
Distributed GWR NE
Distributed GWR GD
time (sec).
Number of regression points
Cluster performance
0
500
1000
1500
2000
2-node 4-node 8-node
Distributed WLR computation
Parallel WLR
Distributed GWR NE
Distributed GWR GD
time (sec).
Land value prediction (GWR)
28
Land value heat map
29
30
Conclusion
• Vietnam real-estate analytics just work!
– Large-scale crawlers
– Big data processing
– Specialized NLP for listing corpus
• However
– lot of undiscovered values from data
– lot of room to improve and to research on
31
Call for collaboration!
Thanks for your attention!
trungtv@soict.hust.edu.vn
32

More Related Content

What's hot

Streaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through KafkaStreaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through KafkaLeo Salemann
 
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler..."Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...Dataconomy Media
 
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)Rich Harris
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyGuy Lansley
 
Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISShaun Lewis
 
Map-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP ProcessingMap-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP ProcessingAlexander Schätzle
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkFlink Forward
 
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLMLconf
 
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...Databricks
 
A seminar on neo4 j
A seminar on neo4 jA seminar on neo4 j
A seminar on neo4 jRishikese MR
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedVasia Kalavri
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DCCCRinc
 
Prediction of taxi rides ETA
Prediction of taxi rides ETAPrediction of taxi rides ETA
Prediction of taxi rides ETADaniel Marcous
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudAnsgar Scherp
 
ESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical dataESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical datageoknow
 
Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream miningAlbert Bifet
 

What's hot (20)

Streaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through KafkaStreaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through Kafka
 
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler..."Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
 
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
 
Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GIS
 
Map-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP ProcessingMap-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP Processing
 
Tutorial5
Tutorial5Tutorial5
Tutorial5
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
 
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
 
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...
 
A seminar on neo4 j
A seminar on neo4 jA seminar on neo4 j
A seminar on neo4 j
 
Os Percy
Os PercyOs Percy
Os Percy
 
Introduction to GIS
Introduction to GISIntroduction to GIS
Introduction to GIS
 
GIS file types
GIS file typesGIS file types
GIS file types
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, Explained
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 
Prediction of taxi rides ETA
Prediction of taxi rides ETAPrediction of taxi rides ETA
Prediction of taxi rides ETA
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
 
ESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical dataESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical data
 
Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream mining
 

Viewers also liked

Vietnam Real Estate Surges by Anthony S Casey
Vietnam Real Estate Surges by Anthony S CaseyVietnam Real Estate Surges by Anthony S Casey
Vietnam Real Estate Surges by Anthony S CaseyAnthony S Casey Singapore
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsViet-Trung TRAN
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learningViet-Trung TRAN
 
How to business in vietnam_2014 All you need know
How to business in vietnam_2014 All you need knowHow to business in vietnam_2014 All you need know
How to business in vietnam_2014 All you need knowduynguyentt
 
Giới thiệu tổng quan về dự án Gamuda Gardens - Gamuda City
Giới thiệu tổng quan về dự án Gamuda Gardens - Gamuda CityGiới thiệu tổng quan về dự án Gamuda Gardens - Gamuda City
Giới thiệu tổng quan về dự án Gamuda Gardens - Gamuda CityQuy Lee
 
A Short PMML Tutorial by LatentView
A Short PMML Tutorial by LatentViewA Short PMML Tutorial by LatentView
A Short PMML Tutorial by LatentViewramesh.latentview
 
E commerce landscape 2012
E commerce landscape 2012E commerce landscape 2012
E commerce landscape 2012we20
 
Doing Business in Vietnam-Presentation
Doing Business in Vietnam-PresentationDoing Business in Vietnam-Presentation
Doing Business in Vietnam-PresentationEran Harish
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
 
Case Study: Analytics at CMC Markets: from measuring clicks to driving business
Case Study: Analytics at CMC Markets: from measuring clicks to driving businessCase Study: Analytics at CMC Markets: from measuring clicks to driving business
Case Study: Analytics at CMC Markets: from measuring clicks to driving businessJohn Sinke
 
Neural Networks for OCR
Neural Networks for OCRNeural Networks for OCR
Neural Networks for OCRDavid Stark
 
Business Intelligence for kids (example project)
Business Intelligence for kids (example project)Business Intelligence for kids (example project)
Business Intelligence for kids (example project)Enrique Benito
 
PMML - Predictive Model Markup Language
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Languageaguazzel
 

Viewers also liked (20)

Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
 
Vietnam Real Estate Surges by Anthony S Casey
Vietnam Real Estate Surges by Anthony S CaseyVietnam Real Estate Surges by Anthony S Casey
Vietnam Real Estate Surges by Anthony S Casey
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
Vienam real estate report 2014
Vienam real estate report 2014Vienam real estate report 2014
Vienam real estate report 2014
 
How to business in vietnam_2014 All you need know
How to business in vietnam_2014 All you need knowHow to business in vietnam_2014 All you need know
How to business in vietnam_2014 All you need know
 
Giới thiệu tổng quan về dự án Gamuda Gardens - Gamuda City
Giới thiệu tổng quan về dự án Gamuda Gardens - Gamuda CityGiới thiệu tổng quan về dự án Gamuda Gardens - Gamuda City
Giới thiệu tổng quan về dự án Gamuda Gardens - Gamuda City
 
Amazon EMR
Amazon EMRAmazon EMR
Amazon EMR
 
Mrkt quoc te
Mrkt quoc teMrkt quoc te
Mrkt quoc te
 
A Short PMML Tutorial by LatentView
A Short PMML Tutorial by LatentViewA Short PMML Tutorial by LatentView
A Short PMML Tutorial by LatentView
 
Vietnam Investment Report Q4 2015 (EN)
Vietnam Investment Report Q4 2015 (EN)Vietnam Investment Report Q4 2015 (EN)
Vietnam Investment Report Q4 2015 (EN)
 
E commerce landscape 2012
E commerce landscape 2012E commerce landscape 2012
E commerce landscape 2012
 
Doing Business in Vietnam-Presentation
Doing Business in Vietnam-PresentationDoing Business in Vietnam-Presentation
Doing Business in Vietnam-Presentation
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
IMC Plan
IMC PlanIMC Plan
IMC Plan
 
Case Study: Analytics at CMC Markets: from measuring clicks to driving business
Case Study: Analytics at CMC Markets: from measuring clicks to driving businessCase Study: Analytics at CMC Markets: from measuring clicks to driving business
Case Study: Analytics at CMC Markets: from measuring clicks to driving business
 
Neural Networks for OCR
Neural Networks for OCRNeural Networks for OCR
Neural Networks for OCR
 
Business Intelligence for kids (example project)
Business Intelligence for kids (example project)Business Intelligence for kids (example project)
Business Intelligence for kids (example project)
 
PMML - Predictive Model Markup Language
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Language
 
HCMC CBD Market Report | May 2014
HCMC CBD Market Report | May 2014 HCMC CBD Market Report | May 2014
HCMC CBD Market Report | May 2014
 

Similar to giasan.vn real-estate analytics: a Vietnam case study

Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageSteven Ramage
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphsStanka Dalekova
 
MachineLearning_Seminar_final.pptx
MachineLearning_Seminar_final.pptxMachineLearning_Seminar_final.pptx
MachineLearning_Seminar_final.pptxEhsanUllah221132
 
Big Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case studyBig Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case studySharjeel Imtiaz
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...Geoffrey Fox
 
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUInfinIT - Innovationsnetværket for it
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupSri Kanajan
 
Big Data : Bits of History, Words of Advice
Big Data : Bits of History, Words of AdviceBig Data : Bits of History, Words of Advice
Big Data : Bits of History, Words of AdviceVenu Vasudevan
 
Thinking spatially with your open data
Thinking spatially with your open dataThinking spatially with your open data
Thinking spatially with your open dataTwinbit
 
Drupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupalDay
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 

Similar to giasan.vn real-estate analytics: a Vietnam case study (20)

Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
MachineLearning_Seminar_final.pptx
MachineLearning_Seminar_final.pptxMachineLearning_Seminar_final.pptx
MachineLearning_Seminar_final.pptx
 
Big Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case studyBig Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case study
 
Opportunities for alternative data sources
Opportunities for alternative data sourcesOpportunities for alternative data sources
Opportunities for alternative data sources
 
M7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal HausenblasM7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal Hausenblas
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
Big Data : Bits of History, Words of Advice
Big Data : Bits of History, Words of AdviceBig Data : Bits of History, Words of Advice
Big Data : Bits of History, Words of Advice
 
Thinking spatially with your open data
Thinking spatially with your open dataThinking spatially with your open data
Thinking spatially with your open data
 
Drupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open data
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 

More from Viet-Trung TRAN

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Viet-Trung TRAN
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreViet-Trung TRAN
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnViet-Trung TRAN
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processingViet-Trung TRAN
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookViet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learningViet-Trung TRAN
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposalsViet-Trung TRAN
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents Viet-Trung TRAN
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Viet-Trung TRAN
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar itemsViet-Trung TRAN
 
Introduction to mining massive datasets
Introduction to mining massive datasetsIntroduction to mining massive datasets
Introduction to mining massive datasetsViet-Trung TRAN
 
Tachyon memory centric, fault tolerance storage for cluster framworks
Tachyon  memory centric, fault tolerance storage for cluster framworksTachyon  memory centric, fault tolerance storage for cluster framworks
Tachyon memory centric, fault tolerance storage for cluster framworksViet-Trung TRAN
 
Interactive big data analytics
Interactive big data analyticsInteractive big data analytics
Interactive big data analyticsViet-Trung TRAN
 

More from Viet-Trung TRAN (20)

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
 
Introduction to mining massive datasets
Introduction to mining massive datasetsIntroduction to mining massive datasets
Introduction to mining massive datasets
 
6 clustering
6 clustering6 clustering
6 clustering
 
2 association rules
2 association rules2 association rules
2 association rules
 
Tachyon memory centric, fault tolerance storage for cluster framworks
Tachyon  memory centric, fault tolerance storage for cluster framworksTachyon  memory centric, fault tolerance storage for cluster framworks
Tachyon memory centric, fault tolerance storage for cluster framworks
 
Interactive big data analytics
Interactive big data analyticsInteractive big data analytics
Interactive big data analytics
 

Recently uploaded

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Recently uploaded (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

giasan.vn real-estate analytics: a Vietnam case study

  • 1. Real-estate analytics: A Vietnam case study Real-estate analytics: a Vietnam case study Viet-Trung Tran School of Communication and Information Technology Hanoi University of Science and Technology
  • 2. Outline • Problem • Where big data analytics can help • Geographically weighted regression for property appraisal • Conclusion 2
  • 3. Problem • A national data base is needed to support investors and home buyers. – "After more than 20 years of establishment and development, information on Vietnam’s real estate market Vietnam is still ranked low on transparency" 3
  • 4. Where is my data? • The good – Property listings are almost public on the Internet • The bad – Thousands sites – Semi-structured text, needed NLP • The ugly – Spam/Duplication – Unreal, un-correct, low data quality 4
  • 5. 5 there is a boom in trading floors and many use tricks similar to those adopted by multi-level marketing companies such as sending messages to customers, providing misleading information about real estate products, causing price bubbles.
  • 7. Vietnam real-estate vs. stock market • 300 billions USD (FPT securities/2015) • Lack of high quality data, tons of scrams • Under weak governmental control • No national databases • 33 billionsUSD (quandl.com) • Clear reports & plots, curated data • Strong governmental control • Centralized, real-time monitoring 7
  • 8. Vietnam real-estate vs. things e-commerce • High value, high ROI • Immobile 8 • Low value, no ROI • Mobile, disappeared over time Vietnam property listings are advertised in the same manner as fridges and TV
  • 9. Where big data analytics can help • Index the entire real estate market – 8.5 millions listing to date (02/2017) • Deliver real time market insights – powered by machine learning and Vietnamese language processing 9 MARKET DATA TRANSPARENCY for all SAVE TIME AVOID OVER PRICE for buyers
  • 10. Big data processing 10 Big data processing Natural language processing Crawlers QC: Filters/deduplication Distributed Database Report Chatbot Website
  • 11. Vietnamese language processing • Tasks – Named Entity Recognition (NER) – Vietnamese address normalization (Critical!) 11
  • 12. Big data processing • Tasks – Price timelines for every roads, wards, districts, cities – Automatic property appraisal – More analytics to come • About our data – 8.5 millions listings (to date) – Stored on Hbase – Processed on Spark 12
  • 14. Automatic property appraisal • Tran, Hung Tien, Hiep Tuan Nguyen, and Viet-Trung Tran. "Large-scale geographically weighted regression on Spark." Knowledge and Systems Engineering (KSE), 2016 Eighth International Conference on. IEEE, 2016. 14 GWR + = - Large-scale spatial data - Improve performance - Distributed First Law of Geography - Waldo Tobler: “Everything is related with everything else, but closer things are more related”.
  • 15. Background • First Law of Geography - Waldo Tobler: “Everything is related with everything else, but closer things are more related”. • Model GWR – The OLS estimator takes the form yi (u) = β0i (u) + β1i (u)x1i +β2i (u)x2i + ... + βmi (u)xmi βˆ(u) = (X TW (u)X )−1 X TW (u)Y
  • 16. Background • Kernel function – Gaussian function • Bandwidth 16 fixed bandwidth adaptive bandwidth
  • 17. Problem • Estimating a local model • Bandwidth selection – Which bandwidth is good • Evaluation model – Choose kernel function βˆ(u) = (X TW (u)X )−1 X TW (u)Y Source: http://rose.bris.ac.uk O(n3)
  • 18. Problem • How to apply the model for large-scale data? – Data points – Features – Regression points
  • 19. Large-Scale GWR on Spark • Why is Spark? – In-memory cluster-computing platform – Parallel programming – Resilient distributed datasets
  • 20. Large-Scale GWR on Spark • We propose three approach to scaling GWR – Scaling Weighted Linear Regression – Parallel Multiple WLR models – Parallel Geographically Weighted Regression (combine the first two approach)
  • 21. Scalable GWR on Spark • Naïve approach – Scaling Weighted Linear Regression Foreach regPoint Compute weight Fit Weighted Linear Regression Summary model Compute weight parallel Compute WLR model parallel
  • 22. Scalable GWR on Spark • Parallel Multiple WLR models Regression dataset Training dataset WLR Compute weight WLR Compute parallel multiple WLR models Summary
  • 23. Scalable GWR on Spark • Parallel Geographically Weighted Regression R R R T T T RT RT RT Regression dataset Training dataset Combine dataset Distributed GWR Computation
  • 24. Experiments • Environment – Cluster: 8 nodes on Amazon Web Service • 4 cores Inte Xeon E5-2670 v2 2.5 GHz • 16 GB RAM, 2x40 GB SSD • Hadoop 2.7.2 and Spark 1.6.1 – Dataset | − −x : double(nullable = false) | − −y : double(nullable = false) | − −label : double(nullable = false) | − −f eatures : vector(nullable = false)
  • 25. Large training dataset 0 200 400 600 800 1000 1200 10000 100000 1000000 2000000 5000000 Distributed WLR computation Parallel WLR Distributed GWR NE Distributed GWR GD time (sec). Number of training points
  • 26. Large regression dataset 0 200 400 600 800 1000 1200 1000 5000 10000 20000 50000 Distributed WLR computation Parallel WLR Distributed GWR NE Distributed GWR GD time (sec). Number of regression points
  • 27. Cluster performance 0 500 1000 1500 2000 2-node 4-node 8-node Distributed WLR computation Parallel WLR Distributed GWR NE Distributed GWR GD time (sec).
  • 29. Land value heat map 29
  • 30. 30
  • 31. Conclusion • Vietnam real-estate analytics just work! – Large-scale crawlers – Big data processing – Specialized NLP for listing corpus • However – lot of undiscovered values from data – lot of room to improve and to research on 31 Call for collaboration!
  • 32. Thanks for your attention! trungtv@soict.hust.edu.vn 32

Editor's Notes

  1. Scalability , Performance User-friendly APIs