Detection and reconstruction of 3D buildings in urban areas has been a hot topic of research due to its many applications, including 3D population density studies, emergency planning, and building value estimation. Standard approaches to extract building footprint and measure building height rely on either aerial or space borne point cloud data, which in many areas is unavailable. In contrast, high resolution satellite imagery has become more readily available in recent years, and could provide enough information to estimate a building’s height. Recent successes of deep learning on semantic segmentation have shown that convolutional neural networks can be effective tools at extracting 2D building footprints. Using a digital surface model derived using FOSS and LiDAR data as ground truth, this study goes a step further by employing state of the art deep learning architectures such as U-net to infer both building footprints and estimated building heights in one pass from a single satellite image. This application of open deep learning frameworks can bring the benefits of 3D cities to a larger portion of the world.
Powerful Google developer tools for immediate impact! (2023-24 C)
Using Deep Learning to Derive 3D Cities from Satellite Imagery
1. See the Earth as it could be.
Eric Culbertson
Data Scientist, Astraea
2. • 3D buildings models have many different use cases
• Solar potential estimation
• Utility management
• Disaster planning and simulation
2
3. How are 3D Models Made?
• Standard approaches:
• Use LiDAR point clouds to create 3D mesh
• Stitch imagery taken at multiple angles to the mesh
3
4. • Constructing these 3D models can be labor intensive and
expensive (both in compute power $$$)
• Is there an alternative using
• Free data?
• Open Source tools?
4
5. • Quality of 3D model is
quantified by the level of detail
(LOD) metric
• Higher LOD = more versatile
• But also more difficult to obtain
5
What do I Mean by 3D?
6. • Quality of 3D model is
quantified by the level of detail
(LOD) metric
• Higher LOD = more versatile
• But also more difficult to obtain
6
What do I Mean by 3D?
7. Machine Learning
• Machine learning can be applied to solve difficult problems
without the need of a subject matter expert
• Recent advances in deep learning on images have made great
strides
• Machine learning tools are open source
7
8. • Eventually predict labels for new data
( , ? ?)
8
( , )
• Start by feeding many examples with the correct predictions
Input imagery building
footprints
building
height
Application to Overhead Imagery
9. Imagery Sources
• Overhead imagery is available from many different sources in
my region of interest (Las Vegas)
9
Name Source Bands Resolution (m) Coverage Revisit Cost
Worldview 3 Satellite 8 0.3 m ~ Globe ~ daily $$$$$$
NAIP Aerial 4 1.0 m U.S. 3 years Free
Sanborn Aerial 4 0.3 m Partial U.S. 3 years Free - $
10. Imagery Sources
• Overhead imagery is available from many different sources in my
region of interest (Las Vegas)
• Most time was spent with 2015 NAIP and 2016 Sanborn imagery
10
Name Source Bands Resolution (m) Coverage Revisit Cost
Worldview 3 Satellite 8 0.3 m ~ Globe ~ daily $$$$$$
NAIP Aerial 4 1.0 m U.S. 3 years Free
Sanborn Aerial 4 0.3 m Partial U.S. 3 years Free - $
12. Ground Truth
• Building footprint polygons were provided by the SpaceNet Challenge
• Rasterized with rasterio, shapely, and numpy python modules
• Pixel height truth was derived from 2012 LiDAR data found on USGS
• Quality level is not ideal (pulse density ~ .3 pulses / m2)
12
Raw Lidar point cloud tiles
Filter outliers
Merge tiles
Height above ground
Reproject
Rasterize
Pixel heights
Las2las
PDAL
CloudCompare
14. Neural Network
14
10
11
8
2
Sum + activation
Cat
Prediction
Body length
Tail length
Weight
Number of ears
1.0
8
Body length
Tail length
Weight
• Nesting layers of neurons allows the network to learn more
complicated features
Dog
15. Convolutional Neural Network
• CNNs are a way to extract important features from an image to
make a prediction
15LearnedFeatures
Sum + activation
Prediction
Cat
Dog
16. CNN Image Segmentation
16
• CNNs also can be used to make predictions per-pixel by
determining important features in regions around that pixel
U-net architecture
17. Application to Overhead Imagery
17
Predict footprints
Predict pixel height
• Similar features are used to determine the building footprint and height
• Combining the learning process shares knowledge gained from learning
each task
• This saves time and manual effort
Input imagery
Shared weights
Combine To make
2.5D model
18. • Keras was used to implement the U-net architecture
• High level wrapper of either Tensorflow, Theano or CNTK
• Allows for fast experimentation
• Simple to use, but flexible
18
FOSS for Deep Learning
19. • NAIP shows some promise in getting building height
• Roof shape seems beyond its capability
19
NAIP imagery LiDAR Height Predicted Height
21. Challenges
21
• NAIP and Sanborn imagery do
not line up well with ground
truth polygons
• Offset is not consistent
22. NAIP Results
• Accuracy of only 35%
• Performs well on short
structures
• Biased to predict 2 story
buildings
22
Predicted Num Stories
TrueNumStories
23. Sanborn Results
• Accuracy improves to 56%
• Still struggles on
residential sized buildings
23
Predicted Num Stories
TrueNumStories
28. Acknowledgements
• SpaceNet Challenge
• Accurate ground truth footprints were quite valuable
• Kohei Ozaki
• His solution to the SpaceNet challenge introduced me to image segmentation
28
Editor's Notes
Focus on use cases for 3d models
Look more at that paper
Expand on solar
Population estimation, noise porpogation, energy consumption
Learn more about photogrammetry
Potentially split into two slides
Another slide to expand on why they are expensive and difficult and time consuming
Not scalable
Machine learning
Open source software
Open data
Cheap, fast, easy, scalable