SlideShare a Scribd company logo
1 of 19
Searching Relational Data
with Elasticsearch
Dr. Renaud Delbru
CTO, Siren Solutions
● CTO, SIREn Solutions
– Search, Big Data, Knowledge Graph
● Lucene / Solr Contributor
– E.g., Cross Data Center Replication
– Lucene Revolution 2013, 2014
– Lucene In Action, 2nd Edition
● Author of the SIREn plugin
Introducing myself
● Open source search
systems
– Lucene, Solr, Elasticsearch
● Document-based model
– Flat key-value model
– Originally developed for
searching full-text documents
Background
firstname John
lastname
title
Smith
Mr Dr
Background
● Data is usually more
complex
– Nested objects
● XML, JSON
● E.g., US patents
– Relations
● RDBMS, RDF, Graph, Documents
with links to entities or other
documents
Article
{
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address" : {
"street" : "21 2nd
Street",
"city" : "New York",
"state" : "NY"
},
"phoneNumber" : [
{ "type" : "home", "number" : "212 555-1234" },
{ "type" : "fax", "number" : "646 555-4567" }
]
}
Person
Company
Crunchbase example
Elastic
Series A
Series B
Data
Collective
Benchmark
Index
Venture
name : Elastic
funding_rounds.round_code : A
funding_rounds.founded_year : 2012
funding_rounds.round_code : B
funding_rounds.founded_year : 2013
funding_rounds.investments.name : Benchmark
funding_rounds.investments.name : Data Collective
funding_rounds.investments.name : Index Ventures
● Pros:
– Relatively easy
– Fast
● Cons:
– Loss of precision, false positive
– Index-time data materialisation
– Data duplication (child)
– Not optimal for updates
Common solutions
name : Elastic
f_r.round_code : A
f_r.founded_year : 2012
f_r.inv.name : Benchmarkname : Elastic
f_r.round_code : A
f_r.founded_year : 2012
f_r.inv.name : Data Collectivename : Elastic
f_r.round_code : B
f_r.founded_year : 2013
f_r.inv.name : Benchmarkname : Elastic
f_r.round_code : B
f_r.founded_year : 2013
f_r.inv.name : Index Ventures
● Pros:
– Relatively easy
– No loss of precision
● Cons:
– Index-time data materialisation
– Combinatorial explosion
– Duplicate results: query-time grouping is necessary
– Data duplication (parent and child)
– Not optimal for updates
Common solutions
● Lucene's BlockJoin
– Feature to provide relational search
– “Nested” type in Elasticsearch
● Model
– One (flat) document per record
– Joins computed at index time
– Related documents are indexed in
a same “block”
{
"company": {
"properties" : {
"funding_rounds" : {
"type" : "nested",
"properties" : {
"investments" : {
"type" : "nested"
} } } } } }
Index-time join
Index-time join
● Pros:
– Fast (join precomputed, data locality)
– No loss of precision
● Cons:
– Index-time data materialisation
– Data duplication (child)
– Not optimal for updates
– High memory usage for complex nested model
Document Block
name : Elastic
country_code : A
...
round_code : A
founded_year : 2012
...
Name : Data Collective
Type : Org
Name : Benchmark
Type : Org
round_code : B
founded_year : 2013
...
Name : Index Venture
Type : Org
Name : Benchmark
Type : Org
Index-time join
● SIREn Plugin
– Plugin to Lucene, Solr, Elasticsearch
– Add native index for nested data type
– http://siren.solutions/siren/overview/
● Model
– One document per “tree”
– Joins computed at index time
– Rich data model (JSON)
● Nested objects, nested arrays, multi-valued
attributes, datatypes
{
"company": {
"properties" : {
"_siren_source" : {
"analyzer" : "concise",
"postings_format" : "Siren10AFor",
"store" : "no",
"type" : "string"
} } } }
Index-time join
name : Elastic
country_code : A
...
round_code : A
founded_year : 2012
...
round_code : B
founded_year : 2013
...
Name : Data Collective
Type : Org
Name : Benchmark
Type : Org
Name : Index Venture
Type : Org
Name : Benchmark
Type : Org
● Pros:
– Fast (join precomputed, data locality)
– No loss of precision
– Low memory usage, even for complex nested model
● Cons:
– Index-time data materialisation
– Data duplication (child)
– Not optimal for updates
1
1.1
1.2
1.1.1
1.1.2
1.2.1
1.2.2
Index-time join
More information on our blog post
Query-time join
● Elasticsearch's Parent-Child
– Query-time join for nested data
● Model
– One (flat) document per record
– At index time, child documents should
specify their parent ID with the
_parent field
– Joins computed at query time
{
"company": {},
"investment" : {
"_parent" : {
"type" : "company",
}
},
"investor" : {
"_parent" : {
"type" : "investment",
}
}
}
Query-time join
● Pros:
– Update friendly
– No loss of precision
– Data locality: parent and child on same shard
● Cons:
– Slower than index-time solutions
– Larger memory use than nested
– Data duplication (child)
● A child cannot have more than one parent
– Index-time data materialisation
name : Elastic
country_code : A
...
round_code : A
founded_year : 2012
...
Name : Data Collective
Type : Org
Name : Benchmark
Type : Org
round_code : B
founded_year : 2013
...
Name : Index Venture
Type : Org
Name : Benchmark
Type : Org
Query-time join
● FilterJoin's Plugin
– Query-time join for relational data
● Inspired from #3278
● Model
– One (flat) document per record
– At index time, documents should specify the IDs of their related documents in
a given field
– At query time, lookup ID values from a given field to filter documents from
another index
Query-time join
● Pros:
– Update friendly
– No loss of precision
– No data duplication
– No index-time data materialisation
● Cons:
– Slower than parent-child
– No data locality principle: network transfer
name : Elastic
country_code : A
...
round_code : A
founded_year : 2012
...
Name : Data Collective
Type : Org
round_code : B
founded_year : 2013
...
Name : Index Venture
Type : Org
Name : Benchmark
Type : Org
● Each solution has its own advantages and disadvantages
– Trade-off between performance, scalability and flexibility
BlockJoin SIREn Parent-Child FilterJoin
Performance ++ ++ + -
Scalability + ++ + +
Flexibility - - + ++
Best for ●Simple nested
model
●Fixed data
●Complex nested
model
●Fixed data
●Simple nested
model
●Dynamic data
●Relational model
●Dynamic data
Summary
Pivot Browser
Knowledge Browser
Crunchbase Demo
Contact Info
76 Tudor Lawn, Newcastle
info@siren.solutions
siren.solutions
We're hiring!

More Related Content

What's hot

Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기AWSKRUG - AWS한국사용자모임
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Willy Lulciuc
 
AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!Chris Taylor
 
mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교Woo Yeong Choi
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Twitter Search Architecture
Twitter Search Architecture Twitter Search Architecture
Twitter Search Architecture Ramez Al-Fayez
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overviewABC Talks
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionDatabricks
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영NAVER D2
 

What's hot (20)

Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
 
AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!
 
mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Twitter Search Architecture
Twitter Search Architecture Twitter Search Architecture
Twitter Search Architecture
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영
 

Viewers also liked

Elasticsearch in Zalando
Elasticsearch in ZalandoElasticsearch in Zalando
Elasticsearch in ZalandoAlaa Elhadba
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineDaniel N
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionDavid Pilato
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
 
Elasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsElasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsAlaa Elhadba
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문SeungHyun Eom
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaAmazee Labs
 

Viewers also liked (9)

Elasticsearch in Zalando
Elasticsearch in ZalandoElasticsearch in Zalando
Elasticsearch in Zalando
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English version
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
Elasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsElasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & Aggregations
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 

Similar to Searching Relational Data with Elasticsearch

FIWARE Wednesday Webinars - Introduction to NGSI-LD
FIWARE Wednesday Webinars - Introduction to NGSI-LDFIWARE Wednesday Webinars - Introduction to NGSI-LD
FIWARE Wednesday Webinars - Introduction to NGSI-LDFIWARE
 
No Sql in Enterprise Java Applications
No Sql in Enterprise Java ApplicationsNo Sql in Enterprise Java Applications
No Sql in Enterprise Java ApplicationsPatrick Baumgartner
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesAlessandro Adamou
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters
 
The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015Michele Pasin
 
How IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problemsHow IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problemsikanow
 
Schema Design
Schema DesignSchema Design
Schema DesignMongoDB
 
ECIR-2014: Multilanguage Content Discovery Through Entity Driven Search
ECIR-2014: Multilanguage Content Discovery Through Entity Driven SearchECIR-2014: Multilanguage Content Discovery Through Entity Driven Search
ECIR-2014: Multilanguage Content Discovery Through Entity Driven SearchAntonio David Pérez Morales
 
Content Discovery Through Entity Driven Search
Content Discovery Through Entity Driven SearchContent Discovery Through Entity Driven Search
Content Discovery Through Entity Driven SearchAlessandro Benedetti
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsMongoDB
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
 
Structured Data: It's All about the Graph | Richard Wallis, Data Liberate
Structured Data: It's All about the Graph | Richard Wallis, Data LiberateStructured Data: It's All about the Graph | Richard Wallis, Data Liberate
Structured Data: It's All about the Graph | Richard Wallis, Data LiberateClick Consult (Part of Ceuta Group)
 
Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!Richard Wallis
 
Next generation linked in talent search
Next generation linked in talent searchNext generation linked in talent search
Next generation linked in talent searchRyan Wu
 
Test Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsTest Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsHugh McCamphill
 

Similar to Searching Relational Data with Elasticsearch (20)

FIWARE Wednesday Webinars - Introduction to NGSI-LD
FIWARE Wednesday Webinars - Introduction to NGSI-LDFIWARE Wednesday Webinars - Introduction to NGSI-LD
FIWARE Wednesday Webinars - Introduction to NGSI-LD
 
Data Modeling with NGSI, NGSI-LD
Data Modeling with NGSI, NGSI-LDData Modeling with NGSI, NGSI-LD
Data Modeling with NGSI, NGSI-LD
 
LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 
No Sql in Enterprise Java Applications
No Sql in Enterprise Java ApplicationsNo Sql in Enterprise Java Applications
No Sql in Enterprise Java Applications
 
Publishing Linked Data using Schema.org
Publishing Linked Data using Schema.orgPublishing Linked Data using Schema.org
Publishing Linked Data using Schema.org
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
 
The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015
 
Beyond SQL: Managing Events and Relationships in Social Care
Beyond SQL: Managing Events and Relationships in Social CareBeyond SQL: Managing Events and Relationships in Social Care
Beyond SQL: Managing Events and Relationships in Social Care
 
How IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problemsHow IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problems
 
Schema Design
Schema DesignSchema Design
Schema Design
 
ECIR-2014: Multilanguage Content Discovery Through Entity Driven Search
ECIR-2014: Multilanguage Content Discovery Through Entity Driven SearchECIR-2014: Multilanguage Content Discovery Through Entity Driven Search
ECIR-2014: Multilanguage Content Discovery Through Entity Driven Search
 
Content Discovery Through Entity Driven Search
Content Discovery Through Entity Driven SearchContent Discovery Through Entity Driven Search
Content Discovery Through Entity Driven Search
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
 
Structured Data: It's All about the Graph | Richard Wallis, Data Liberate
Structured Data: It's All about the Graph | Richard Wallis, Data LiberateStructured Data: It's All about the Graph | Richard Wallis, Data Liberate
Structured Data: It's All about the Graph | Richard Wallis, Data Liberate
 
Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!
 
Next generation linked in talent search
Next generation linked in talent searchNext generation linked in talent search
Next generation linked in talent search
 
Test Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsTest Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely tests
 

Recently uploaded

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 

Recently uploaded (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

Searching Relational Data with Elasticsearch

  • 1. Searching Relational Data with Elasticsearch Dr. Renaud Delbru CTO, Siren Solutions
  • 2. ● CTO, SIREn Solutions – Search, Big Data, Knowledge Graph ● Lucene / Solr Contributor – E.g., Cross Data Center Replication – Lucene Revolution 2013, 2014 – Lucene In Action, 2nd Edition ● Author of the SIREn plugin Introducing myself
  • 3. ● Open source search systems – Lucene, Solr, Elasticsearch ● Document-based model – Flat key-value model – Originally developed for searching full-text documents Background firstname John lastname title Smith Mr Dr
  • 4. Background ● Data is usually more complex – Nested objects ● XML, JSON ● E.g., US patents – Relations ● RDBMS, RDF, Graph, Documents with links to entities or other documents Article { "firstName": "John", "lastName": "Smith", "age": 25, "address" : { "street" : "21 2nd Street", "city" : "New York", "state" : "NY" }, "phoneNumber" : [ { "type" : "home", "number" : "212 555-1234" }, { "type" : "fax", "number" : "646 555-4567" } ] } Person Company
  • 5. Crunchbase example Elastic Series A Series B Data Collective Benchmark Index Venture
  • 6. name : Elastic funding_rounds.round_code : A funding_rounds.founded_year : 2012 funding_rounds.round_code : B funding_rounds.founded_year : 2013 funding_rounds.investments.name : Benchmark funding_rounds.investments.name : Data Collective funding_rounds.investments.name : Index Ventures ● Pros: – Relatively easy – Fast ● Cons: – Loss of precision, false positive – Index-time data materialisation – Data duplication (child) – Not optimal for updates Common solutions
  • 7. name : Elastic f_r.round_code : A f_r.founded_year : 2012 f_r.inv.name : Benchmarkname : Elastic f_r.round_code : A f_r.founded_year : 2012 f_r.inv.name : Data Collectivename : Elastic f_r.round_code : B f_r.founded_year : 2013 f_r.inv.name : Benchmarkname : Elastic f_r.round_code : B f_r.founded_year : 2013 f_r.inv.name : Index Ventures ● Pros: – Relatively easy – No loss of precision ● Cons: – Index-time data materialisation – Combinatorial explosion – Duplicate results: query-time grouping is necessary – Data duplication (parent and child) – Not optimal for updates Common solutions
  • 8. ● Lucene's BlockJoin – Feature to provide relational search – “Nested” type in Elasticsearch ● Model – One (flat) document per record – Joins computed at index time – Related documents are indexed in a same “block” { "company": { "properties" : { "funding_rounds" : { "type" : "nested", "properties" : { "investments" : { "type" : "nested" } } } } } } Index-time join
  • 9. Index-time join ● Pros: – Fast (join precomputed, data locality) – No loss of precision ● Cons: – Index-time data materialisation – Data duplication (child) – Not optimal for updates – High memory usage for complex nested model Document Block name : Elastic country_code : A ... round_code : A founded_year : 2012 ... Name : Data Collective Type : Org Name : Benchmark Type : Org round_code : B founded_year : 2013 ... Name : Index Venture Type : Org Name : Benchmark Type : Org
  • 10. Index-time join ● SIREn Plugin – Plugin to Lucene, Solr, Elasticsearch – Add native index for nested data type – http://siren.solutions/siren/overview/ ● Model – One document per “tree” – Joins computed at index time – Rich data model (JSON) ● Nested objects, nested arrays, multi-valued attributes, datatypes { "company": { "properties" : { "_siren_source" : { "analyzer" : "concise", "postings_format" : "Siren10AFor", "store" : "no", "type" : "string" } } } }
  • 11. Index-time join name : Elastic country_code : A ... round_code : A founded_year : 2012 ... round_code : B founded_year : 2013 ... Name : Data Collective Type : Org Name : Benchmark Type : Org Name : Index Venture Type : Org Name : Benchmark Type : Org ● Pros: – Fast (join precomputed, data locality) – No loss of precision – Low memory usage, even for complex nested model ● Cons: – Index-time data materialisation – Data duplication (child) – Not optimal for updates 1 1.1 1.2 1.1.1 1.1.2 1.2.1 1.2.2
  • 13. Query-time join ● Elasticsearch's Parent-Child – Query-time join for nested data ● Model – One (flat) document per record – At index time, child documents should specify their parent ID with the _parent field – Joins computed at query time { "company": {}, "investment" : { "_parent" : { "type" : "company", } }, "investor" : { "_parent" : { "type" : "investment", } } }
  • 14. Query-time join ● Pros: – Update friendly – No loss of precision – Data locality: parent and child on same shard ● Cons: – Slower than index-time solutions – Larger memory use than nested – Data duplication (child) ● A child cannot have more than one parent – Index-time data materialisation name : Elastic country_code : A ... round_code : A founded_year : 2012 ... Name : Data Collective Type : Org Name : Benchmark Type : Org round_code : B founded_year : 2013 ... Name : Index Venture Type : Org Name : Benchmark Type : Org
  • 15. Query-time join ● FilterJoin's Plugin – Query-time join for relational data ● Inspired from #3278 ● Model – One (flat) document per record – At index time, documents should specify the IDs of their related documents in a given field – At query time, lookup ID values from a given field to filter documents from another index
  • 16. Query-time join ● Pros: – Update friendly – No loss of precision – No data duplication – No index-time data materialisation ● Cons: – Slower than parent-child – No data locality principle: network transfer name : Elastic country_code : A ... round_code : A founded_year : 2012 ... Name : Data Collective Type : Org round_code : B founded_year : 2013 ... Name : Index Venture Type : Org Name : Benchmark Type : Org
  • 17. ● Each solution has its own advantages and disadvantages – Trade-off between performance, scalability and flexibility BlockJoin SIREn Parent-Child FilterJoin Performance ++ ++ + - Scalability + ++ + + Flexibility - - + ++ Best for ●Simple nested model ●Fixed data ●Complex nested model ●Fixed data ●Simple nested model ●Dynamic data ●Relational model ●Dynamic data Summary
  • 19. Contact Info 76 Tudor Lawn, Newcastle info@siren.solutions siren.solutions We're hiring!

Editor's Notes

  1. <number> S