Generative AI on Enterprise Cloud with NiFi and Milvus
giasan.vn real-estate analytics: a Vietnam case study
1. Real-estate analytics: A Vietnam case study
Real-estate analytics: a Vietnam case study
Viet-Trung Tran
School of Communication and Information Technology
Hanoi University of Science and Technology
2. Outline
• Problem
• Where big data analytics can help
• Geographically weighted regression for
property appraisal
• Conclusion
2
3. Problem
• A national data base is needed to support investors and home
buyers.
– "After more than 20 years of establishment and development, information on
Vietnam’s real estate market Vietnam is still ranked low on transparency"
3
4. Where is my data?
• The good
– Property listings are almost public on the Internet
• The bad
– Thousands sites
– Semi-structured text, needed NLP
• The ugly
– Spam/Duplication
– Unreal, un-correct, low data quality
4
5. 5
there is a boom in trading floors and many use tricks similar
to those adopted by multi-level marketing companies such
as sending messages to customers, providing misleading
information about real estate products, causing price
bubbles.
7. Vietnam real-estate vs. stock market
• 300 billions USD (FPT
securities/2015)
• Lack of high quality data, tons
of scrams
• Under weak governmental
control
• No national databases
• 33 billionsUSD (quandl.com)
• Clear reports & plots, curated
data
• Strong governmental control
• Centralized, real-time
monitoring
7
8. Vietnam real-estate vs. things e-commerce
• High value, high ROI
• Immobile
8
• Low value, no ROI
• Mobile, disappeared over time
Vietnam property listings are advertised in the same
manner as fridges and TV
9. Where big data analytics can help
• Index the entire real estate market
– 8.5 millions listing to date (02/2017)
• Deliver real time market insights
– powered by machine learning and Vietnamese
language processing
9
MARKET DATA
TRANSPARENCY
for all
SAVE TIME
AVOID OVER PRICE
for buyers
10. Big data processing
10
Big data processing
Natural language
processing
Crawlers
QC: Filters/deduplication
Distributed Database
Report
Chatbot
Website
12. Big data processing
• Tasks
– Price timelines for every roads, wards, districts, cities
– Automatic property appraisal
– More analytics to come
• About our data
– 8.5 millions listings (to date)
– Stored on Hbase
– Processed on Spark
12
14. Automatic property appraisal
• Tran, Hung Tien, Hiep Tuan Nguyen, and Viet-Trung Tran. "Large-scale
geographically weighted regression on Spark." Knowledge and Systems
Engineering (KSE), 2016 Eighth International Conference on. IEEE, 2016.
14
GWR + =
- Large-scale spatial data
- Improve performance
- Distributed
First Law of Geography - Waldo Tobler:
“Everything is related with everything else, but
closer things are more related”.
15. Background
• First Law of Geography - Waldo Tobler:
“Everything is related with everything else, but closer
things are more related”.
• Model GWR
– The OLS estimator takes the form
yi (u) = β0i (u) + β1i (u)x1i +β2i (u)x2i + ... + βmi (u)xmi
βˆ(u) = (X TW (u)X )−1 X TW (u)Y
17. Problem
• Estimating a local model
• Bandwidth selection
– Which bandwidth is good
• Evaluation model
– Choose kernel function
βˆ(u) = (X TW (u)X )−1 X TW (u)Y
Source: http://rose.bris.ac.uk
O(n3)
18. Problem
• How to apply the model for large-scale
data?
– Data points
– Features
– Regression points
19. Large-Scale GWR on Spark
• Why is Spark?
– In-memory cluster-computing platform
– Parallel programming
– Resilient distributed datasets
20. Large-Scale GWR on Spark
• We propose three approach to scaling GWR
– Scaling Weighted Linear Regression
– Parallel Multiple WLR models
– Parallel Geographically Weighted Regression
(combine the first two approach)
21. Scalable GWR on Spark
• Naïve approach – Scaling Weighted Linear
Regression
Foreach regPoint
Compute weight
Fit Weighted
Linear Regression
Summary model
Compute weight
parallel
Compute WLR
model parallel
23. Scalable GWR on Spark
• Parallel Geographically Weighted Regression
R
R
R
T
T
T
RT
RT
RT
Regression
dataset
Training
dataset
Combine
dataset
Distributed GWR Computation
31. Conclusion
• Vietnam real-estate analytics just work!
– Large-scale crawlers
– Big data processing
– Specialized NLP for listing corpus
• However
– lot of undiscovered values from data
– lot of room to improve and to research on
31
Call for collaboration!