2. Spatial Database
2
ď‚—Stores a large amount of space-related data
ď‚—Maps
ď‚—Remote Sensing
ď‚—Medical Imaging
ď‚—VLSI chip layout
ď‚—Have Topological and distance information
ď‚—Require spatial indexing, data access, reasoning ,geometric
computation and knowledge representation techniques
3. Spatial Data Mining
3
ď‚—Extraction of knowledge, spatial relationships from
spatial databases
ď‚—Can be used for understanding spatial data and spatial
relationships
ď‚—Applications:
ď‚—GIS, Geomarketing, Remote Sensing, Image database
exploration, medical imaging, Navigation
ď‚—Challenges
ď‚—Complexity of spatial data types and access methods
ď‚—Large amounts of data
4. Cont.
4
ď‚—Non-spatial Information
ď‚—Same as data in traditional data mining
ď‚—Numerical, categorical, ordinal, boolean, etc
e.g., city name, city population
ď‚—Spatial Information
ď‚—Spatial attribute: geographically referenced
ď‚— Neighborhood and extent
ď‚—Location, e.g., longitude, latitude, elevation
ď‚—Spatial data representations
ď‚—Raster: gridded space
ď‚—Vector: point, line, polygon
ď‚—Graph: node, edge, path
7. Statistical techniques
7
ď‚—Popular approach to analyze spatial data
ď‚—Assumes independence among spatial data
ď‚—Can be performed only by experts
ď‚—Do not work well with symbolic values
8. Spatial Data Warehousing
8
ď‚—Spatial data warehouse: Integrated, subject-oriented, time-variant,
and nonvolatile spatial data repository.
ď‚—It consists of both spatial and non spatial in support of spatial data mining
and spatial-data-related decision-making processes.
ď‚—Spatial data cube: multidimensional spatial database
ď‚—Both dimensions and measures may contain spatial components.
ď‚—Challenging issues:
ď‚—Spatial data integration: a big issue
ď‚—Structure-specific formats (raster- vs. vector-based, OO vs. relational models,
different storage and indexing, etc.)
ď‚—Vendor-specific formats (ESRI, MapInfo, Intergraph, IDRISI, etc.)
ď‚—Realization of Fast and flexible OLAP in spatial data warehouses.
9. Dimensions and Measures in Spatial
Data Warehouse
9
ď‚—Dimensions
ď‚—non-spatial
e.g. “25-30 degrees” generalizes to“hot” (both are strings)
ď‚—spatial-to-non spatial
e.g. Seattle generalizes to description “Pacific Northwest” (as a string)
ď‚—spatial-to-spatial
ď‚—e.g. Seattle generalizes to Pacific Northwest (as a spatial region)
ď‚—Measures
ď‚—numerical (e.g. monthly revenue of a region)
ď‚—distributive (e.g. count, sum)
ď‚—algebraic (e.g. average)
ď‚—holistic (e.g. median, rank)
ď‚—spatial
ď‚—collection of spatial pointers (e.g. pointers to all regions with temperature of
25-30 degrees in July)
10. Example: British Columbia Weather
Pattern Analysis
10
ď‚—Input
ď‚—A map with about 3,000 weather probes scattered in B.C.
ď‚—Recording daily data for temperature, precipitation, wind velocity, etc. for a designated
small area and transmitting signal to a provincial weather station.
ď‚—Data warehouse using star schema
ď‚—Output
ď‚—A map that reveals patterns: merged (similar) regions
ď‚—Goals
ď‚—Interactive analysis (drill-down, slice, dice, pivot, roll-up)
ď‚—Fast response time
ď‚—Minimizing storage space used
ď‚—Challenge
A merged region may contain hundreds of “primitive” regions (polygons)
11. Star Schema of the BC Weather
Warehouse
ď‚—Spatial data warehouse
ď‚—Dimensions
ď‚—region_name
ď‚—time
ď‚—temperature
ď‚—precipitation
ď‚—Measurements
ď‚—region_map
ď‚—area
ď‚—count
11Fact tableDimension table
12. 12
Can we precompute all of the possible spatial merges
and store them in the corresponding cuboid cells of a
spatial data cube?
ď‚—Probably not.
ď‚—It requires multi-megabytes of storage.
ď‚—On-line computation is slow and expensive.
14. Methods for Computing Spatial Data
Cubes
14
ď‚—On-line aggregation: collect and store pointers to spatial
objects in a spatial data cube
ď‚—expensive and slow, need efficient aggregation techniques
ď‚—Precompute and store all the possible combinations
ď‚—huge space overhead
ď‚—Precompute and store rough approximations in a spatial data
cube
ď‚—accuracy trade-off, MBR
ď‚—Selective computation: only materialize those which will be
accessed frequently
ď‚—a reasonable choice
15. Mining Spatial Association and
Co-location Patterns
15
Spatial association rule: A ⇒ B [s%, c%]
ď‚—A and B are sets of spatial or non-spatial predicates
ď‚—Topological relations: intersects, overlaps, disjoint, etc.
ď‚—Spatial orientations: left_of, west_of, under, etc.
ď‚—Distance information: close_to, within_distance, etc.
ď‚—s% is the support and c% is the confidence of the rule
ď‚—Examples
is_a(x, “School”) ^ Close_to(x, “Sports_Center”) → close_to(x, “Park”)
[7%, 85%]
16. Progressive Refinement
16
ď‚—Progressive Refinement:
ď‚—spatial association mining needs to evaluate multiple spatial relationships
among a large no. of spatial object – expensive.
ď‚—Hierarchy of spatial relationship:
ď‚—First search for rough relationship and then refine it
Superset coverage property – all the potential answers should be perserved
(i.e.false-positive test).
ď‚—Two-step mining of spatial association:
ď‚—Step 1: Rough spatial computation (as a filter)
ď‚— Using MBR for rough estimation
ď‚—Step2: Detailed spatial algorithm (as refinement)
ď‚— Apply only to those objects which have passed the rough spatial association test
(no less than min_support)
17. Spatial co-locations
17
ď‚—Just what one really wants to explore.
ď‚—Based on the property of spatial autocorrelation, interesting
features likely coexist in closely located regions.
ď‚—Efficient methods - Apriori , progressive refinement,etc.
19. Spatial Cluster Analysis
19
• Mining clusters—k-means, k-medoids, hierarchical, density-based,
etc.
• Analysis of distinct features of the clusters
20. Spatial Classification
20
ď‚—Analyze spatial objects to derive classification schemes, such
as decision trees, in relevance to certain spatial properties
(district, highway, river, etc.)
ď‚—Classifying medium-size families according to income, region, and infant mortality
rates
ď‚—Mining for volcanoes on Venus
ď‚—Employ methods such as:
ď‚—Decision-tree classification, NaĂŻve-Bayesian classifier + boosting, neural network,
genetic programming, etc.
21. Spatial Trend Analysis
21
ď‚—Function
ď‚—Detect changes and trends along a spatial dimension
ď‚—Study the trend of non-spatial or spatial data changing with space
ď‚—Application examples
ď‚—Observe the trend of changes of the climate or vegetation with
increasing distance from an ocean
ď‚—Crime rate or unemployment rate change with regard to city geo-
distribution.
ď‚—Traffic flows in highways and in cities.
24. Other Applications
24
ď‚—Spatial data mining is used in
ď‚— NASA Earth Observing System (EOS): Earth science data
ď‚—National Inst. of Justice: crime mapping
ď‚— Census Bureau, Dept. of Commerce: census data
ď‚— Dept. of Transportation (DOT): traffic data
ď‚—National Inst. of Health(NIH): cancer clusters
ď‚— Commerce, e.g. Retail Analysis