SCOPE OF THE PROJECT:
The project is focused on the creation of a Data Warehouse application, for the analysis of property sales in Brooklyn, one of the five boroughs of New York CIty. The project is split into 5 main phases:
Phase 1: Finding the dataset, understanding its structure and what are the meaningful business questions, this dataset could answer.
Phase 2: Extract-Transform-Load processes for the data warehouse, using R Studio.
Phase 3: Building of the Data Warehouse using Microsoft SQL Server.
Phase 4: Building the Multidimensional Cube using Microsoft Analysis Services and Visual Studio.
Phase 5: OLAP Report and Data Visualization (using Tableau).
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Brooklyn Property Sales - DATA WAREHOUSE (DW)
1. Front page
f o r B r o o k l y n P r o p e r t y S a l e s D a t a
DATA WAREHOUSE
BARATSAS
SOTIRIS
SPANOS
NIKOS
A T H E N S U N I V E R S I T Y O F E C O N O M I C S A N D B U S I N E S S
M S c i n B u s i n e s s A n a l y t i c s
P R O J E C T
D A T A M A N A G E M E N T & B U S I N E S S I N T E L L I G E N C E
2. Who we are
Sotiris Baratsas
sotbaratsas@gmail.com
Nikos Spanos
nickosspan@gmail.com
5. The Business Challenge
Make better decisions
Which areas to focus
our marketing efforts
and manpower on?
Which kind of properties
would be the best use of
our time & budget?
How can we take
advantage of market
changes quickly?
Data-driven pricing for increased commissions
and shorter order fulfillment time
7. Dataset & Challenges
SOURCE
NEW YORK CITY
DEPARTMENT OF FINANCE
ANNUAL ROLLING SALES DATA
AGGREGATED FOR 2003-2017
*Dataset Link: https://www.kaggle.com/tianhwu/brooklynhomes2003to2017
8. Dataset & Challenges
SOURCE
NEW YORK CITY
DEPARTMENT OF FINANCE
ANNUAL ROLLING SALES DATA
AGGREGATED FOR 2003-2017
VERY LARGE DATASET
390.883 observations
*Dataset Link: https://www.kaggle.com/tianhwu/brooklynhomes2003to2017
9. Dataset & Challenges
SOURCE
NEW YORK CITY
DEPARTMENT OF FINANCE
ANNUAL ROLLING SALES DATA
AGGREGATED FOR 2003-2017
VERY LARGE DATASET
390.883 observations
*Dataset Link: https://www.kaggle.com/tianhwu/brooklynhomes2003to2017
TAX-FOCUSED DATASET
111 columns with mostly tax-related,
bureaucratic or duplicate variables
10. Dataset & Challenges
SOURCE
NEW YORK CITY
DEPARTMENT OF FINANCE
ANNUAL ROLLING SALES DATA
AGGREGATED FOR 2003-2017
VERY LARGE DATASET
390.883 observations
*Dataset Link: https://www.kaggle.com/tianhwu/brooklynhomes2003to2017
TAX-FOCUSED DATASET
111 columns with mostly tax-related,
bureaucratic or duplicate variables
COMPLICATED TAX SYSTEM
We needed to read a lot about legal terms of the NYC
tax system and extract data from additional sources
11. Cleaning the data
20
highly valuable
columns
11
dimensions
9
measures
We removed irrelevant and duplicate columns,
and ended up with
13. Let’s take a lookLet’s take a lookLet’s take a lookLet’s take a look
14. Cleaning the data
Replaced missing
measure values with NULL
Replaced missing
dimension values with NA
Removed false values
(years, ZIP Codes, districts)
Extracted data from
additional data sources
(bldg classes, lot type, land use)
Extracted ”Month Sold”
and ”Year Sold” columns
Identified the correct data
type for each column
40. Client Examples
A client wants to sell her property.
Aware of the Assessed Value that the city’s finance
department has placed on her lot, she wants to set
the starting price equal to the Assessed Value.
42. Client Examples
A client comes to us, to sell his property.
He tells us he has a 2-floor, inside lot in East New York.
To get the job, our agent has to show he has a good
understanding of the prices of this type of property.
43. Client Examples
Our agent can access
valuable information in
seconds, appear as an
expert and make smarter
pricing decisions.
44. Front page
f o r y o u r a t t e n t i o n
THANK YOU
BARATSAS
SOTIRIS
SPANOS
NIKOS