Merging statistics and geospatial information - demography / commuting / spatial planning / registers
1. Merging statistics
and geospatial information
Demography / Commuting / Spatial planning / Registers
Mirosław Migacz
Chief GIS Specialist
Central Statistical Office of Poland
INSPIRE Conference 2014: Inspire for good governance
Aalborg, June 17th 2013
2. 2
Agenda
The aim
The team
The tasks
• Spatial visualization of demographic data
• Enterprise address spatialization
• Commuting statistics
• Statistical indicators for spatial planning
The results
• Conclusions
3. The aim
3
• Population and Housing Census
2011 results
• other statistical datasets possessed
by CSO
Geospatial analysis
with use of:
• Spatial address databases
(maintained within official statistics)
• Database of Topographic Objects
(acquired from the mapping agency)
Evaluation of
reference materials
in geostatistics
production
process:
4. The team
4
Programming and Coordination of
Statistical Surveys Department @
CSO, Warsaw
• Amelia Wardzińska-Sharif
• Janusz Dygaszewicz
• Mirosław Migacz
• Magdalena Pączek-Borowska
• Agnieszka Nowakowska
Urban Statistics Centre @ SO Poznań
• Sylwia Filas-Przybył
• Maciej Kaźmierczak
• Dawid Pawlikowski
Regional and Environmental Surveys
Department @ CSO, Warsaw
• Marek Pieniążek
• Robert Buciak
Statistical Computing Centre, Łódź
• Radosław Jabłoński
6. Spatial visualization of demographic data
Source data
• attribute data
• spatial data
Methods of aggregation to various statistical units
• 1 km x 1 km grid
• Cadastral units
• Statistical regions
• Census enumeration areas
Cartographic presentation of the results
7. Source data
Attribute
• Tables with population
distribution data acquired from
the Population and Housing
Census 2011:
• Person ID
• X, Y coordinates (acquired
from spatial address
databases created and
maintained within official
statistics)
• 39 tables (one for each million
people)
Spatial
• Boundaries of statistical regions
and census enumeration areas
(spatial address databases)
• Cadastral units (mapping
agency)
• Kilometer grid –
Grid_ETRS89_LAEA_PL_1K
(European Forum for Geography
and Statistics)
8. 1 km x 1 km grid
• Grid_ETRS89_LAEA_PL_1K – the european INSPIRE grid
• Cell coordinates – lower left corner
• Aggregation of persons to specific grid cells possible w/o GIS
software (Visual Basic for Applications used here for example)
9. 1 km x 1 km grid
• Number of persons in each grid cell calculated with ArcGIS
(Dissolve tool), though any other database software could be
used
• The operation was conducted separately for each of the 39
tables
10. Cadastral units
• Aggregation to irregular division of space requires GIS
software
• Environment: ArcGIS file geodatabase
• Spatial operations on a feature class with 38,5 mln objects
exceed RAM capabilities of workstations and servers
11. Cadastral units
Back to 39 separate tables >> need for automation
Use of Python scripting with the arcpy module that contains all ArcGIS
tools
• The script was processing 39 datasets
– Spatial join of the 1st dataset to the
geometry of cadastral units (with
calculation of total population) – the
initial dataset
– For each subsequent spatial join the
current calculated population was added
to the total population for each cadastral
unit
12. Statistical regions
and census enumeration areas
The same tools that were used for cadastral units
(ArcGIS, Python)
A slightly different method of cyclic dataset processing:
• statistical regions / census enumeration areas were spatially joined to
datasets with persons 39 times >> 39 temporary feature classes
• 39 feature classes merged into one >> 1 feature class with 39
duplicate geometries for each statistical region / census enumeration
area
• deduplication of the geometries with total population calculation for
each geometry (Dissolve tool in ArcGIS)
13. Data aggregation – conclusions
• Point data aggregation to grids can be done without GIS
software – any database software with e.g. VBA is sufficient
• Point data aggregation to an irregular division of space
requires GIS software
• Processing of huge datasets requires automation, which can
be acchieved with Python scripting:
– requires script preparation and testing on a data sample
– all processes can be run on a separate machine / server and they do
not require the operator’s attention
14. Cartographic presentation of the results
• 1 km x 1 km grid – total population in each grid cell (=
population density)
• Cadastral units, statistical regions, census enumeration areas
– choropleth maps of population density
Classifications
(5 classes)
average value as the
center of the middle class
quantiles
Colour scales
2-color gradient
monochromatic
24. Quantiles – conclusions
• Significant differences between quantile presentations:
– For the 1 km x 1 km grid a separate class for „0” was created
– Huge differences in classification between cadastral units and census
enumeration areas due to these divisions having been created for
different purposes:
• Cadastral units for legal management of land ownership
• Census enumeration areas for the purpose of conducting censuses (size
dependant on the population count)
26. Source data
Attribute
• Social insurance
registers
• Taxpayers register
• Inland revenues
database
• Statistical register
of enterprises
Spatial
• Spatial address
databases
(maintained within
official statistics)
• Database
of Topographic
Objects (acquired
from the mapping
agency)
27. Enterprise address spatialization
Pairing „as is”
(62%)
Address
number
simplification
(e.g. 3A -> 3)
(5,9%)
No address
point (nearest
address
number)
(18,6%)
No address
number
(address point
on same street
or locality
centroid)
(1,8%)
No street ID
(locality
centroid)
(3,6%)
Other cases
(locality
centroid)
(8,1%)
Address descriptive information paired with:
•address points from the Spatial Address Databases
•address points from the Database of Topographic
Objects
29. Commuter
a person whose employer’s registered office is outside the administrative borders
of the gmina (municipality, LAU2) of residence
30. Commuting statistics
Source data
• attribute data
• spatial data
Actions
• Directions of population movements related to employment
• Commuting to/from Poznań
• Commuting within voivodships
Cartographic presentation of the results
31. Source data
Attribute
• Tables with demographic data
acquired from the Population and
Housing Census 2011:
• Person ID
• Age
• Gender
• Dwelling address and X, Y
coordinates
• Workplace address and X, Y
coordinates
• Income
• Economic activity classification
• Fact of commuting
• 3,1 million records
Spatial
• Boundaries of the territorial
division of the country
• Spatial Address Databases (source
of dwelling coordinates and
boundaries of statistical regions
and census enumeration areas)
• Spatialized enterprise addresses
(source of workplace coordinates)
• Kilometer grid –
Grid_ETRS89_LAEA_PL_1K
(European Forum for Geography
and Statistics)
39. Statistical indicators for spatial planning
Source data
• spatial data
Scope
• selected administrative units
Aims
• Source data usability analysis for purposes of creating statistical
indicators for spatial planning
• methodology for statistical indicators describing building density
• methodology for statistical indicators describing road density
Cartographic presentation of the indicators
40. Source data
Spatial
• Database of Topographic Objects (buildings and road
network)
• cadastral data
• ortophotomap
• Boundaries of statistical regions and census
enumeration areas (spatial address databases)
• Boundaries of the territorial division of the country
42. Source data evaluation
• comparing the content of randomly selected grid cells within
the Database of Topographic Objects with the ortophotomap
• roads - 59 grid cells sampled out of a total number of 1185
– 79,7% cells with total compliance
– rest with compliance > 75%
• buildings - 135 grid cells sampled
out of a total number of 2697
– 44,5% cells with total compliance
– 37,8% cells with compliance > 75%
– rest majorly with compliance > 50%
• gaps found mainly in urban areas
omissions in the building layer
43. Building density indicator (%)
areasurveyP
areabuildingtotalP
ratiodensitybuildingW
P
P
W
P
Z
Z
P
Z
Z
%100
44. Building density indicator
grid cell with the biggest number of buildings
grid cell with the highest building density ratio
town of Ząbki
city of Wołomin
48. Conclusions
census
results
referenced to
a point (X,Y)
huge
opportunity
for spatial
analyses
geostatistical
products that
reflect user
needs
high demand
for
demographic
data lower
than LAU2
level
positive
reception
of project
results
SUCCESS
49. Conclusions
• The project outcome will have a strong impact on future
developments of the Geostatistics Portal (incl. INSPIRE
services)
• Wednesday, June 18th, 16:00 @ Room 4
„Geostatistics Portal – the multitool for statistics on maps”
(session: „Maps, Stats and Observation Data”)
GEO.STAT.GOV.PL
50. Merging statistics
and geospatial information
Demography / Commuting / Spatial planning / Registers
Mirosław Migacz
Chief GIS Specialist
Central Statistical Office of Poland
@mireslav
www.linkedin.com/in/migacz
m.migacz@stat.gov.pl
www.slideshare.net/MirosawMigacz