SlideShare a Scribd company logo
1 of 26
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Lucene/Solr Spatial in 2015
David Smiley
Search Engineer/Consultant (Freelance)
3
About David Smiley
Freelance Search Developer/Consultant
Expert Lucene/Solr development skills,
advise (consulting), training
Java, spatial, and full-stack experience
Apache Lucene/Solr committer & PMC member
Primary author of “Apache Solr Enterprise Search Server”
4
More Spatial Contributors!
Spatial4j Lucene Solr
David Smiley ✔️ ✔️ ✔️
Ryan McKinley ✔️
Justin Deoliveira ✔️
Mike McCandless ✔️
Nick Knize ✔️
Karl Wright ✔️
Ishan Chattopadhyaya ✔️
5
Agenda
New Features / Capabilities
New Approaches
Improvements
Pending
6
Topic: New Features
Heatmaps / grid faceting — Lucene, Solr
Surface-of-sphere shapes (Geo3d) — Lucene
Accurate indexed geometries — Lucene, Solr
GeoJSON read/write — Spatial4j
7
Heatmaps: Spatial Grid Faceting
Spatial density summary grid faceting,
also useful for point-plotting search results
Usually rendered with a gradient radius
Lucene & Solr APIs
Scalable & fast usually…
v5.2
8
Heatmaps Under the Hood
Requires a PrefixTreeStrategy Lucene field — grid based
Algorithm enumerates the underlying cell/terms and accumulates
the counter in a corresponding grid
Conceptually facet.method=enum for spatial
Works on non-point indexed shapes too
Complexity: O(cells * cellDepthFactor) not O(docs)
No/low memory; mainly the grid of integers
Solr will distribute to shards and merge
Could be faster still; a BFS (vs DFS) layout would be perfect
9
Solr Heatmap Faceting
On an RPT field
(SpatialRecursivePrefixTreeFieldType)
prefixTree=“packedQuad”
Query:
/select?facet=true
&facet.heatmap=geo_rpt
&facet.heatmap.geom=
["-180 -90" TO "180 90”]
facet.heatmap.format=ints2D or png
// Normal Solr response...
"facet_counts":{
... // facet response fields
"facet_heatmaps":{
"loc_srpt":[
"gridLevel",2,
"columns",32,
"rows",32,
"minX",-180.0,
"maxX",180.0,
"minY",-90.0,
"maxY",90.0,
"counts_ints2D", [null, null, [0, 0, ... ]]
...
10
Solr Heatmap Resources
Solr Ref guide:
https://cwiki.apache.org/confluence/display/solr/Spatial+Search
Jack Reed’s Tutorial: http://www.jack-reed.com/2015/06/29/visualizing-10-
million-geonames-with-leaflet-solr-heatmap-facets.html
Live Demo: http://worldwidegeoweb.com
Open-source JavaScript Solr Heatmap Libraries
https://github.com/spacemansteve/SolrHeatmapLayer
https://github.com/mejackreed/leaflet-solr-heatmap
https://github.com/voyagersearch/leaflet-solr-heatmap
11
Geo3D: Shapes on the Surface of a Sphere
… or Ellipsoid of configurable axis
Not a general 3D space geometry lib
Internally uses geocentric X, Y, Z coordinates (hence 3D) with
3D planar geometry mathematics
Shapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString)
with optional buffer
Distance computations: Arc (angular or surface), Linear (straight-
line), Normal
12
All 2D Maps of the Earth Distort Straight Lines
A straight bird-flies
path from
Anchorage to
Miami doesn’t
actually cross the
ocean!
13
Geo3D, continued…
Benefits
Inherently more accurate than 2D projected spatial
especially for big shapes or near poles
Many computations are fast; no expensive trigonometry
An alternative to JTS without the LGPL license (still)
Has own Lucene module (spatial3d), thus jar file
Maven groupId: org.apache.lucene, artifact: lucene-spatial3d
No Solr integration yet; pending more Spatial4j integration
14
Index & Search Geo3D Geometries
Spatial4j Geo3dShape
wrapper with RPT
In Lucene-spatial for now
Index Geo3d shapes
Limited to grid accuracy
Query by Geo3d shape
Limited distance sort
Heatmaps
Geo3DPointField &
PointInGeo3DShapeQuery
Based on a 3D BKD index
In spatial3d module
Index points-only
No multi-valued
Query by Geo3d shape
No distance sort
Leaner & faster than RPT
v5.4v5.2
15
RPT/SpatialPrefixTrees and Accuracy
RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree
Thus represents shapes as grid cells of varying precision by
prefix
Example, a point shape:
D, DR, DRT, DRT2, DRT2Y
More accuracy scales
Example, a polygon shape:
Too many to list… 508 cells
More accuracy does NOT scale
16
Combining RPT with Serialized Geometry
RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate)
SDV (SerializedDVStrategy) stores serialized geometry (accurate)
RPT + SDV → CompositeSpatialStrategy
Accuracy & speed & smaller indexes
Optimized intersects predicate avoids some geometry checks
> 80% faster intersects queries, 75% smaller index
Solr adapter: RptWithGeometrySpatialField
Compatible with the Heatmaps feature
Includes a shape cache (per-segment); configurable
v5.2
17
Topic: New Approaches
Lucene
BKD Tree Indexes
GeoPointField
18
BKD Tree Indexes
New numeric/spatial index approach with own file format
Not based on Lucene Terms index
https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf
Much faster and compact than Trie/PrefixTree based indexes
Wither term auto-prefixing? LUCENE-5879
Indexed point-data only; multi-valued mostly
Intersects predicate only
Filtering only (no distance or other scoring)
Multiple implementations… (next slide)
Neat visualization
https://youtu.be/x9WnzOvsGKs
19
Multiple BKD Implementations
Multiple implementations of the same BKD concept:
(1D) RangeTreeDocValuesFormat
(2D) BKDPointField & BKD…Query
(3D) Geo3DPointField & PointInGeo3DShapeQuery
(ND) LUCENE-6825 (to Lucene-core) in-progress
1D,2D,3D Implementations are either in lucene-sandbox or
lucene-spatial3d for now
No Lucene-spatial module SpatialStrategy wrappers yet
thus no Spatial4j Shape integration nor Solr integration yet
20
BKD 1D: RangeTree
Efficient range search on single/multi-valued numbers or terms
Could be used for numbers, dates, IPV6 bytes, …
Alternatives: Normal number fields (trie), DateRangeField (RPT)
Would love to see a benchmark!
How-To:
RangeTreeDocValuesFormat
Numbers: SortedNumericDocValuesField with
NumericRangeTreeQuery
Bytes: SortedSetDocValuesField with SortedSetRangeTreeQuery
v5.3
21
BKD 2D: BKDPointField
Efficient 2D geospatial point index
Alternative to RPT or GeoPointField
5.7x faster than RPT w/ GeoHash. Smaller indexes.
How-To:
Use BKDPointField (requires BKDTreeDocValuesFormat)
Query:
BKDPointInBBoxQuery
BKDPointInPolygonQuery
point-radius (circle) — in-progress LUCENE-6698
v5.3
22
GeoPointField
2D geospatial point field
Indexed point-only data, single/multi-valued
Spatial 2D Trie/PrefixTree terms index
But not affiliated with Lucene-spatial SpatialPrefixTree/RPT
Configurable 2x grid size (defaults to 512)
Compact bit interleaved Z-order encoding
Re-uses much of Lucene’s numeric precisionStep &
MultiTermQuery logic
2-phase grid/postings then doc-values algorithm
v5.3
23
…continued
Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy
No Heatmaps, No custom Shape implementations
No Solr support yet
No dependencies
Easy to use compared to RPT; simpler internally too
How-To:
doc.add(new GeoPointField(name, lon, lat, Store.YES))
GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery
or GeoPointInPolygonQuery. …DistanceRangeQuery pending
24
Topic: Improvements
Spatial4j
Minimal longitude bounding-box algorithm
Lucene (PrefixTree / RPT indexing)
Leaner & faster non-point indexes
New PackedQuadPrefixTree
Solr
Distance units: Kilometers/Miles/Degrees
Nicer ST_* spatial query parsers (almost done)
25
Topic: Some Pending Spatial TODOs
Spatial4j
Geo3D integration — a JTS
alternative
Lucene
FlexPrefixTree — LUCENE-
4922
Multi-dimensional BKD —
LUCENE-6825
SpatialStrategy adapters for
GeoPointField, etc.
Solr
Better spatial Solr
QParsers — SOLR-4242
GeoJSON parsing
More FieldType adapters
for latest Lucene spatial
DateRangeField faceting
Nearest-neighbor search
Well, 2015 isn’t over yet.
:-)
26
That’s all for now; thanks for coming!
Need Lucene/Solr guidance or custom development?
Contact me!
Email: dsmiley@apache.org
LinkedIn: http://www.linkedin.com/in/davidwsmiley
G+: +DavidSmiley
Twitter: @DavidWSmiley

More Related Content

What's hot

Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim
 
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...JAX London
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNDat Nguyen
 
Techniques for Organization and Visualization of Community Photo Collections
Techniques for Organization and Visualization of Community Photo CollectionsTechniques for Organization and Visualization of Community Photo Collections
Techniques for Organization and Visualization of Community Photo CollectionsKumar Srijan
 
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Yi-Hsuan Tsai
 
Introduction to spatial data analysis in r
Introduction to spatial data analysis in rIntroduction to spatial data analysis in r
Introduction to spatial data analysis in rRichard Wamalwa
 
Graphical Objects and Scene Graphs
Graphical Objects and Scene GraphsGraphical Objects and Scene Graphs
Graphical Objects and Scene GraphsSyed Zaid Irshad
 
Mapping, GIS and geolocating data in Java
Mapping, GIS and geolocating data in JavaMapping, GIS and geolocating data in Java
Mapping, GIS and geolocating data in JavaJoachim Van der Auwera
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...DB Tsai
 
A x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequencesA x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequencesTakeshi Yamamuro
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Spark Summit
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...DB Tsai
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networksEntrepreneur / Startup
 

What's hot (20)

Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Objects as points
Objects as pointsObjects as points
Objects as points
 
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
 
Techniques for Organization and Visualization of Community Photo Collections
Techniques for Organization and Visualization of Community Photo CollectionsTechniques for Organization and Visualization of Community Photo Collections
Techniques for Organization and Visualization of Community Photo Collections
 
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
 
Introduction to spatial data analysis in r
Introduction to spatial data analysis in rIntroduction to spatial data analysis in r
Introduction to spatial data analysis in r
 
Graphical Objects and Scene Graphs
Graphical Objects and Scene GraphsGraphical Objects and Scene Graphs
Graphical Objects and Scene Graphs
 
PCL (Point Cloud Library)
PCL (Point Cloud Library)PCL (Point Cloud Library)
PCL (Point Cloud Library)
 
Real-time lightmap baking
Real-time lightmap bakingReal-time lightmap baking
Real-time lightmap baking
 
Mapping, GIS and geolocating data in Java
Mapping, GIS and geolocating data in JavaMapping, GIS and geolocating data in Java
Mapping, GIS and geolocating data in Java
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
 
A x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequencesA x86-optimized rank&select dictionary for bit sequences
A x86-optimized rank&select dictionary for bit sequences
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
 
design_doc
design_docdesign_doc
design_doc
 
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networks
 
Efficient Parallel Set-Similarity Joins Using MapReduce
 Efficient Parallel Set-Similarity Joins Using MapReduce Efficient Parallel Set-Similarity Joins Using MapReduce
Efficient Parallel Set-Similarity Joins Using MapReduce
 

Similar to Lucene/Solr spatial in 2015

Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...
Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...
Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...Thierry Badard
 
The state of geo in ElasticSearch
The state of geo in ElasticSearchThe state of geo in ElasticSearch
The state of geo in ElasticSearchFan Robbin
 
Mapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX LondonMapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX LondonJoachim Van der Auwera
 
[2019 Strata] Self Sevice BI meets Geospatial Analysis
[2019 Strata] Self Sevice BI meets Geospatial Analysis[2019 Strata] Self Sevice BI meets Geospatial Analysis
[2019 Strata] Self Sevice BI meets Geospatial AnalysisHeejae(Kyungtaak) Noh
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Ram Sriharsha
 
Giving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS CommunityGiving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS CommunityMongoDB
 
Geospatial for Java
Geospatial for JavaGeospatial for Java
Geospatial for JavaJody Garnett
 
New features in WhirlyGlobe-Maply Version 2.4 and Beyond!
New features in WhirlyGlobe-Maply Version 2.4 and Beyond!New features in WhirlyGlobe-Maply Version 2.4 and Beyond!
New features in WhirlyGlobe-Maply Version 2.4 and Beyond!mousebird
 
State of GeoServer 2.14
State of GeoServer 2.14State of GeoServer 2.14
State of GeoServer 2.14Jody Garnett
 
GIS Analysis For Site Remediation
GIS Analysis For Site RemediationGIS Analysis For Site Remediation
GIS Analysis For Site RemediationJoseph Luchette
 
LocationTech Tour 2016 - Vectortiles
LocationTech Tour 2016 - Vectortiles LocationTech Tour 2016 - Vectortiles
LocationTech Tour 2016 - Vectortiles Morgan Thompson
 
Geospatial for Java
Geospatial for JavaGeospatial for Java
Geospatial for JavaJody Garnett
 
Big Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC SystemsBig Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC SystemsHPCC Systems
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterDatabricks
 
Geographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresGeographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresKostis Kyzirakos
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech ProjectsJody Garnett
 

Similar to Lucene/Solr spatial in 2015 (20)

Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...
Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...
Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...
 
The state of geo in ElasticSearch
The state of geo in ElasticSearchThe state of geo in ElasticSearch
The state of geo in ElasticSearch
 
No(Geo)SQL
No(Geo)SQLNo(Geo)SQL
No(Geo)SQL
 
Mapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX LondonMapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX London
 
[2019 Strata] Self Sevice BI meets Geospatial Analysis
[2019 Strata] Self Sevice BI meets Geospatial Analysis[2019 Strata] Self Sevice BI meets Geospatial Analysis
[2019 Strata] Self Sevice BI meets Geospatial Analysis
 
Day 6 - PostGIS
Day 6 - PostGISDay 6 - PostGIS
Day 6 - PostGIS
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017
 
Giving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS CommunityGiving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS Community
 
Geospatial for Java
Geospatial for JavaGeospatial for Java
Geospatial for Java
 
New features in WhirlyGlobe-Maply Version 2.4 and Beyond!
New features in WhirlyGlobe-Maply Version 2.4 and Beyond!New features in WhirlyGlobe-Maply Version 2.4 and Beyond!
New features in WhirlyGlobe-Maply Version 2.4 and Beyond!
 
State of GeoServer 2.14
State of GeoServer 2.14State of GeoServer 2.14
State of GeoServer 2.14
 
GIS Data Types
GIS Data TypesGIS Data Types
GIS Data Types
 
GIS Analysis For Site Remediation
GIS Analysis For Site RemediationGIS Analysis For Site Remediation
GIS Analysis For Site Remediation
 
LocationTech Tour 2016 - Vectortiles
LocationTech Tour 2016 - Vectortiles LocationTech Tour 2016 - Vectortiles
LocationTech Tour 2016 - Vectortiles
 
Geospatial for Java
Geospatial for JavaGeospatial for Java
Geospatial for Java
 
Big Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC SystemsBig Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC Systems
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
 
Geographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresGeographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF Stores
 
Why is postgis awesome?
Why is postgis awesome?Why is postgis awesome?
Why is postgis awesome?
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 

Recently uploaded

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Lucene/Solr spatial in 2015

  • 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. Lucene/Solr Spatial in 2015 David Smiley Search Engineer/Consultant (Freelance)
  • 3. 3 About David Smiley Freelance Search Developer/Consultant Expert Lucene/Solr development skills, advise (consulting), training Java, spatial, and full-stack experience Apache Lucene/Solr committer & PMC member Primary author of “Apache Solr Enterprise Search Server”
  • 4. 4 More Spatial Contributors! Spatial4j Lucene Solr David Smiley ✔️ ✔️ ✔️ Ryan McKinley ✔️ Justin Deoliveira ✔️ Mike McCandless ✔️ Nick Knize ✔️ Karl Wright ✔️ Ishan Chattopadhyaya ✔️
  • 5. 5 Agenda New Features / Capabilities New Approaches Improvements Pending
  • 6. 6 Topic: New Features Heatmaps / grid faceting — Lucene, Solr Surface-of-sphere shapes (Geo3d) — Lucene Accurate indexed geometries — Lucene, Solr GeoJSON read/write — Spatial4j
  • 7. 7 Heatmaps: Spatial Grid Faceting Spatial density summary grid faceting, also useful for point-plotting search results Usually rendered with a gradient radius Lucene & Solr APIs Scalable & fast usually… v5.2
  • 8. 8 Heatmaps Under the Hood Requires a PrefixTreeStrategy Lucene field — grid based Algorithm enumerates the underlying cell/terms and accumulates the counter in a corresponding grid Conceptually facet.method=enum for spatial Works on non-point indexed shapes too Complexity: O(cells * cellDepthFactor) not O(docs) No/low memory; mainly the grid of integers Solr will distribute to shards and merge Could be faster still; a BFS (vs DFS) layout would be perfect
  • 9. 9 Solr Heatmap Faceting On an RPT field (SpatialRecursivePrefixTreeFieldType) prefixTree=“packedQuad” Query: /select?facet=true &facet.heatmap=geo_rpt &facet.heatmap.geom= ["-180 -90" TO "180 90”] facet.heatmap.format=ints2D or png // Normal Solr response... "facet_counts":{ ... // facet response fields "facet_heatmaps":{ "loc_srpt":[ "gridLevel",2, "columns",32, "rows",32, "minX",-180.0, "maxX",180.0, "minY",-90.0, "maxY",90.0, "counts_ints2D", [null, null, [0, 0, ... ]] ...
  • 10. 10 Solr Heatmap Resources Solr Ref guide: https://cwiki.apache.org/confluence/display/solr/Spatial+Search Jack Reed’s Tutorial: http://www.jack-reed.com/2015/06/29/visualizing-10- million-geonames-with-leaflet-solr-heatmap-facets.html Live Demo: http://worldwidegeoweb.com Open-source JavaScript Solr Heatmap Libraries https://github.com/spacemansteve/SolrHeatmapLayer https://github.com/mejackreed/leaflet-solr-heatmap https://github.com/voyagersearch/leaflet-solr-heatmap
  • 11. 11 Geo3D: Shapes on the Surface of a Sphere … or Ellipsoid of configurable axis Not a general 3D space geometry lib Internally uses geocentric X, Y, Z coordinates (hence 3D) with 3D planar geometry mathematics Shapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString) with optional buffer Distance computations: Arc (angular or surface), Linear (straight- line), Normal
  • 12. 12 All 2D Maps of the Earth Distort Straight Lines A straight bird-flies path from Anchorage to Miami doesn’t actually cross the ocean!
  • 13. 13 Geo3D, continued… Benefits Inherently more accurate than 2D projected spatial especially for big shapes or near poles Many computations are fast; no expensive trigonometry An alternative to JTS without the LGPL license (still) Has own Lucene module (spatial3d), thus jar file Maven groupId: org.apache.lucene, artifact: lucene-spatial3d No Solr integration yet; pending more Spatial4j integration
  • 14. 14 Index & Search Geo3D Geometries Spatial4j Geo3dShape wrapper with RPT In Lucene-spatial for now Index Geo3d shapes Limited to grid accuracy Query by Geo3d shape Limited distance sort Heatmaps Geo3DPointField & PointInGeo3DShapeQuery Based on a 3D BKD index In spatial3d module Index points-only No multi-valued Query by Geo3d shape No distance sort Leaner & faster than RPT v5.4v5.2
  • 15. 15 RPT/SpatialPrefixTrees and Accuracy RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree Thus represents shapes as grid cells of varying precision by prefix Example, a point shape: D, DR, DRT, DRT2, DRT2Y More accuracy scales Example, a polygon shape: Too many to list… 508 cells More accuracy does NOT scale
  • 16. 16 Combining RPT with Serialized Geometry RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate) SDV (SerializedDVStrategy) stores serialized geometry (accurate) RPT + SDV → CompositeSpatialStrategy Accuracy & speed & smaller indexes Optimized intersects predicate avoids some geometry checks > 80% faster intersects queries, 75% smaller index Solr adapter: RptWithGeometrySpatialField Compatible with the Heatmaps feature Includes a shape cache (per-segment); configurable v5.2
  • 17. 17 Topic: New Approaches Lucene BKD Tree Indexes GeoPointField
  • 18. 18 BKD Tree Indexes New numeric/spatial index approach with own file format Not based on Lucene Terms index https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf Much faster and compact than Trie/PrefixTree based indexes Wither term auto-prefixing? LUCENE-5879 Indexed point-data only; multi-valued mostly Intersects predicate only Filtering only (no distance or other scoring) Multiple implementations… (next slide) Neat visualization https://youtu.be/x9WnzOvsGKs
  • 19. 19 Multiple BKD Implementations Multiple implementations of the same BKD concept: (1D) RangeTreeDocValuesFormat (2D) BKDPointField & BKD…Query (3D) Geo3DPointField & PointInGeo3DShapeQuery (ND) LUCENE-6825 (to Lucene-core) in-progress 1D,2D,3D Implementations are either in lucene-sandbox or lucene-spatial3d for now No Lucene-spatial module SpatialStrategy wrappers yet thus no Spatial4j Shape integration nor Solr integration yet
  • 20. 20 BKD 1D: RangeTree Efficient range search on single/multi-valued numbers or terms Could be used for numbers, dates, IPV6 bytes, … Alternatives: Normal number fields (trie), DateRangeField (RPT) Would love to see a benchmark! How-To: RangeTreeDocValuesFormat Numbers: SortedNumericDocValuesField with NumericRangeTreeQuery Bytes: SortedSetDocValuesField with SortedSetRangeTreeQuery v5.3
  • 21. 21 BKD 2D: BKDPointField Efficient 2D geospatial point index Alternative to RPT or GeoPointField 5.7x faster than RPT w/ GeoHash. Smaller indexes. How-To: Use BKDPointField (requires BKDTreeDocValuesFormat) Query: BKDPointInBBoxQuery BKDPointInPolygonQuery point-radius (circle) — in-progress LUCENE-6698 v5.3
  • 22. 22 GeoPointField 2D geospatial point field Indexed point-only data, single/multi-valued Spatial 2D Trie/PrefixTree terms index But not affiliated with Lucene-spatial SpatialPrefixTree/RPT Configurable 2x grid size (defaults to 512) Compact bit interleaved Z-order encoding Re-uses much of Lucene’s numeric precisionStep & MultiTermQuery logic 2-phase grid/postings then doc-values algorithm v5.3
  • 23. 23 …continued Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy No Heatmaps, No custom Shape implementations No Solr support yet No dependencies Easy to use compared to RPT; simpler internally too How-To: doc.add(new GeoPointField(name, lon, lat, Store.YES)) GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery or GeoPointInPolygonQuery. …DistanceRangeQuery pending
  • 24. 24 Topic: Improvements Spatial4j Minimal longitude bounding-box algorithm Lucene (PrefixTree / RPT indexing) Leaner & faster non-point indexes New PackedQuadPrefixTree Solr Distance units: Kilometers/Miles/Degrees Nicer ST_* spatial query parsers (almost done)
  • 25. 25 Topic: Some Pending Spatial TODOs Spatial4j Geo3D integration — a JTS alternative Lucene FlexPrefixTree — LUCENE- 4922 Multi-dimensional BKD — LUCENE-6825 SpatialStrategy adapters for GeoPointField, etc. Solr Better spatial Solr QParsers — SOLR-4242 GeoJSON parsing More FieldType adapters for latest Lucene spatial DateRangeField faceting Nearest-neighbor search Well, 2015 isn’t over yet. :-)
  • 26. 26 That’s all for now; thanks for coming! Need Lucene/Solr guidance or custom development? Contact me! Email: dsmiley@apache.org LinkedIn: http://www.linkedin.com/in/davidwsmiley G+: +DavidSmiley Twitter: @DavidWSmiley

Editor's Notes

  1. There was a “hit by a bus” syndrome until now. I’m going to be presenting a lot of stuff I did not work on.
  2. And list what this talk is *not*. Not a spatial overview
  3. Also, “Spaceman Steve” is a freelancer offering to do heatmap and other Solr/Geo work.
  4. Thanks to Karl Wright (Nokia/HERE)! The only surface-of-sphere shape supported prior to Geo3D was a circle.
  5. From https://www.reddit.com/r/MapPorn/comments/1p8dba/you_can_theoretically_drive_in_a_straight_line/ But don’t harp on this too much; 2D spatial is still useful.
  6. TODO multi-valued? Geo3dShape w/ RPT more flexible Geo3DPointField is new & faster; more to come Neither have Solr support yet.
  7. Suggest QuadPrefixTree for non-point indices like this. Also: Supports most spatial predicates Theoretically could work well for point-data too; I haven’t tried.
  8. Thanks to Mike McCandless.
  9. Multi-dimensional BKD proposal to Lucene core to shore up the duplication and add an official API hook to Lucene core.
  10. Unknown if this is faster for date ranges than DateRangePrefixTree. Likely smaller indices. Both …DocValuesField’s are standard. This could replace normal number fields some day once Lucene has the APIs for BKD. ——— quoting JIRA LUCEN-6697: BKD tree is quite a bit slower to index, especially merging since it fully re-builds the binary tree, and it's using offline sorting, but then is ~46% faster to search, and uses less heap for the in-memory index structure. Index size is a bit smaller, but it is also storing doc values so e.g. you could sort by this field as well vs. LongField which would have to also index doc values to enable sorting = larger index size.
  11. Will point-radius (circle) have a flat and surface-of-sphere version?
  12. Perf? Likely faster than RPT. Indexes certainly via configuration of a high precisionStep cheaper than Quad or even GeoHash too
  13. Optimized the encoding of PrefixTreeStrategy indexes for non-point data: 33% smaller index, 68% faster indexing, and 44% faster searching. YMMV. LUCENE-4942.
  14. No promises! Some of these are new, brought on by new features. (e.g. Lucene then Solr adapter). This list is biased to my interests/awareness.