SlideShare a Scribd company logo
1 of 61
Download to read offline
Table Retrieval and Genera/on
Krisz/an Balog
University of Stavanger

@krisz'anbalog
Informa/on Access 

& Interac/on
h@p://iai.group
SIGIR’18 workshop on Data Search (DATA:SEARCH’18) | Ann Arbor, Michigan, USA, July 2018
JOIN WORK WITH
Shuo Zhang

@imsure318
TABLES ARE EVERYWHERE
MOTIVATION
MOTIVATION
IN THIS TALK
• Three retrieval tasks, with tables as results
• Ad hoc table retrieval
• Query-by-table
• On-the-fly table genera/on
THE ANATOMY OF A RELATIONAL 

(ENTITY-FOCUSED) TABLE
Formula 1 constructors’ statistics 2016
Constructor
Ferrari
Engine Country Base
Force India
Haas
Ferrari
Mercedes
Ferrari
Italy
India
US
Italy
UK
US & UK
Manor Mercedes UK UK
…
…
Formula 1 constructors’ statistics 2016
Constructor
Ferrari
Engine Country Base
Force India
Haas
Ferrari
Mercedes
Ferrari
Italy
India
US
Italy
UK
US & UK
Manor Mercedes UK UK
…
…
Table cap/on
THE ANATOMY OF A RELATIONAL 

(ENTITY-FOCUSED) TABLE
Formula 1 constructors’ statistics 2016
Constructor
Ferrari
Engine Country Base
Force India
Haas
Ferrari
Mercedes
Ferrari
Italy
India
US
Italy
UK
US & UK
Manor Mercedes UK UK
…
…
Core column
(subject column)
THE ANATOMY OF A RELATIONAL 

(ENTITY-FOCUSED) TABLE
We assume that these en//es are recognized and
disambiguated, i.e., linked to a knowledge base
Heading
column labels
(table schema)
Formula 1 constructors’ statistics 2016
Constructor
Ferrari
Engine Country Base
Force India
Haas
Ferrari
Mercedes
Ferrari
Italy
India
US
Italy
UK
US & UK
Manor Mercedes UK UK
…
…
THE ANATOMY OF A RELATIONAL 

(ENTITY-FOCUSED) TABLE
AD HOC TABLE RETRIEVAL
S. Zhang and K. Balog. Ad Hoc Table Retrieval using Seman'c Similarity. 

In: The Web Conference 2018 (WWW '18)
TASK
• Ad hoc table retrieval:
• Given a keyword query as input,
return a ranked list of tables from
a table corpus
Singapore Search
Year
GDP
Nominal
(Billion)
GDP
Nominal
Per Capita
GDP Real
(Billion)
Singapore - Wikipedia, Economy Statistics (Recent Years)
GNI
Nominal
(Billion)
GNI
Nominal
Per Capita
2011 S$346.353 S$66,816 S$342.371 S$338.452 S$65,292
https://en.wikipedia.org/wiki/Singapore
Show more (5 rows total)
Singapore - Wikipedia, Language used most frequently at home
Language Color in Figure Percent
English Blue 36.9%
Show more (6 rows total)
https://en.wikipedia.org/wiki/Singapore
2012 S$362.332 S$68,205 S$354.061 S$351.765 S$66,216
2013 S$378.200 S$70,047 S$324.592 S$366.618 S$67,902
Mandarin Yellow 34.9%
Malay Red 10.7%
APPROACHES
• Unsupervised methods
• Build a document-based representa/on for each table, then
employ conven/onal document retrieval methods
• Supervised methods
• Describe query-table pairs using a set of features, then employ
supervised machine learning ("learning-to-rank")
• Contribu'on #1: new state-of-the-art, using a rich set of features
• Contribu'on #2: new set of seman/c matching features
UNSUPERVISED METHODS
• Single-field document representa/on
• All table content, no structure
• Mul/-field document representa/on
• Separate document fields for embedding document’s /tle,
sec/on /tle, table cap/on, table body, and table headings
SUPERVISED METHODS
• Three groups of features
• Query features
• #query terms, query IDF scores
• Table features
• Table proper/es: #rows, #cols, #empty cells, etc.
• Embedding document: link structure, number of tables, etc.
• Query-table features
• Query terms found in different table elements, LM score, etc.
• Our novel seman/c matching features
FEATURES
SEMANTIC MATCHING
• Main objec/ve: go beyond term-based matching
• Three components:
1. Content extrac/on
2. Seman/c representa/ons
3. Similarity measures
SEMANTIC MATCHING
1. CONTENT EXTRACTION
• The “raw” content of a query/table is represented
as a set of terms, which can be words or en//es
Query …
q1
qn
… Table
t1
tm
SEMANTIC MATCHING
1. CONTENT EXTRACTION
• The “raw” content of a query/table is represented
as a set of terms, which can be words or en//es
Query …
q1
qn
… Table
t1
tm
Entity-based: - Top-k ranked entities from a
knowledge base
- Entities in the core table column
- Top-k ranked entities using the
embedding document/section
title as a query
SEMANTIC MATCHING
2. SEMANTIC REPRESENTATIONS
• Each of the raw terms is mapped to a seman/c
vector representa/on
Query …… Table
q1
qn
t1
tm
~t1
~tm
…
~q1
~qn
…
SEMANTIC REPRESENTATIONS
• Bag-of-concepts (sparse discrete vectors)
• Bag-of-en''es
• Each vector element corresponds to an en/ty
• is 1 if there exists a link between en//es i and j in the KB
• Bag-of-categories
• Each vector element corresponds to a Wikipedia category
• is 1 if en/ty i is assigned to Wikipedia category j
• Embeddings (dense con/nuous vectors)
• Word embeddings
• Word2Vec (300 dimensions, trained on Google news)
• Graph embeddings
• RDF2vec (200 dimensions, trained on DBpedia)
~ti
~ti[j]
~ti[j]
j
SEMANTIC MATCHING
3. SIMILARITY MEASURES
Query …… Table
q1
qn
t1
tm
~t1
~tm
…
~q1
~qn
…
semantic
matching
SEMANTIC MATCHING
EARLY FUSION MATCHING STRATEGY
Query …… Table
q1
qn
t1
tm
~t1
~tm
…
~q1
~qn
…
semantic
matching
Early: Take the centroid of semantic vectors and compute their cosine similarity
~t1
~tm
…
~q1
~qn
…
SEMANTIC MATCHING
LATE FUSION MATCHING STRATEGY
Query …… Table
q1
qn
t1
tm
~t1
~tm
…
~q1
~qn
…
semantic
matching
Late: Compute all pairwise similarities between the query and table semantic
vectors, then aggregate those pairwise similarity scores (sum, avg, or max)
~t1
~tm
…
~q1
~qn
…
AGGR
… …
EXPERIMENTAL EVALUATION
EXPERIMENTAL SETUP
• Table corpus
• WikiTables corpus1: 1.6M tables extracted from Wikipedia
• Knowledge base
• DBpedia (2015-10): 4.6M en//es with an English abstract
• Queries
• Sampled from two sources2,3
• Rank-based evalua/on
• NDCG@5, 10, 15, 20
1 Bhagavatula et al. TabEL: En'ty Linking in Web Tables. In: ISWC ’15.
2 Cafarella et al. Data Integra'on for the Rela'onal Web. Proc. of VLDB Endow. (2009)
3 Vene/s et al. Recovering Seman'cs of Tables on the Web. Proc. of VLDB Endow. (2011)
QS-1 QS-2
video games asian countries currency
us ci/es laptops cpu
kings of africa food calories
economy gdp guitars manufacturer
RELEVANCE ASSESSMENTS
• Collected via crowdsourcing
• Pooling to depth 20, 3120 query-table pairs in total
• Assessors are presented with the following scenario
• "Imagine that your task is to create a new table on the query topic"
• A table is …
• Non-relevant (0): if it is unclear what it is about or it about a different topic
• Relevant (1): if some cells or values could be used from it
• Highly relevant (2): if large blocks or several values could be used from it
RESEARCH QUESTIONS
• RQ1: Can seman/c matching improve retrieval
performance?
• RQ2: Which of the seman/c representa/ons is the
most effec/ve?
• RQ3: Which of the similarity measures performs best?
RESULTS: RQ1
NDCG@10 NDCG@20
Single-field document ranking 0.4344 0.5254
Mul'-field document ranking 0.4860 0.5473
WebTable1 0.2992 0.3726
WikiTable2 0.4766 0.5206
LTR baseline 0.5456 0.6031
STR (LTR + seman'c matching) 0.6293 0.6825
1 Cafarella et al. WebTables: Exploring the Power of Tables on the Web. Proc. of VLDB Endow. (2008)
2 Bhagavatula et al. Methods for Exploring and Mining Tables on Wikipedia. In: IDEA ’13.
• Can seman/c matching improve retrieval performance?
• Yes. STR achieves substan/al and significants improvements over LTR.
RESULTS: RQ2
• Which of the seman/c representa/ons is the most effec/ve?
• Bag-of-en//es.
RESULTS: RQ3
• Which of the similarity measures performs best?
• Late-sum and Late-avg (but it also depends on the representa/on)
FEATURE ANALYSIS
QUERY-BY-TABLE
Currently under peer review
ON-THE-FLY TABLE
GENERATION
S. Zhang and K. Balog. On-the-fly Table Genera'on. 

In: 41st InternaMonal ACM SIGIR Conference on Research and Development in InformaMon Retrieval (SIGIR '18)
TASK
• On-the-fly table generaMon:
• Answer a free text query with a rela/onal table, where
• the core column lists all relevant en//es;
• columns correspond to a@ributes of those en//es;
• cells contain the values of the corresponding en/ty a@ributes.
Video albums of Taylor Swift Search
Title Released data Label
CMT Crossroads: Taylor Swift and …
Formats
Journey to Fearless
Speak Now World Tour-Live
The 1989 World Tour Live
Jun 16, 2009
Oct 11, 2011
Nov 21, 2011
Dec 20, 2015
Big Machine
Shout! Factory
Big Machine
Big Machine
DVD
Blu-ray, DVD
CD/Blu-ray, …
Streaming
E
V
S
APPROACH Core column en/ty ranking and
schema determina/on could
poten/ally mutually reinforce
each other.
Query
(q)
E
Core column
en+ty ranking
Schema
determina+on
S
Value lookup
V
E
S
ALGORITHM
Query
(q)
E
Core column
en+ty ranking
Schema
determina+on
Value lookup
E
S
S
V
KNOWLEDGE BASE ENTRY
ed
ep
Property: value
Entity name
Entity type
Description
Property: value
…
ea
CORE COLUMN ENTITY RANKING
scoret(e, q) =
X
i
wi i(e, q, St 1
)
CORE COLUMN ENTITY RANKING
FEATURES
En/ty’s relevance to the query
computed using language modeling
CORE COLUMN ENTITY RANKING
FEATURES Query
Entity
Matching
Degree
Matching
matrix
Top-k
entries
Dense
layer
Hidden
layers
Output
layer
…
(n ⇥ m)
s is the concatena/on of all schema labels in S
is the string concatena/on operator
CORE COLUMN ENTITY RANKING
FEATURES
Compa/bility matrix:
ei
sj
Cij =
(
1, if matchKB(ei, sj) _ matchT C(ei, sj)
0, otherwise .
ESC(S, ei) =
1
|S|
X
j
Cij
En/ty-schema compa/bility score:
SCHEMA DETERMINATION
scoret(s, q) =
X
i
wi i(s, q, Et 1
)
SCHEMA DETERMINATION FEATURES
P(s|q) =
X
T 2T
P(s|T)P(T|q)
Schema label likelihood
Table’s relevance
to the query
P(s|T) =
(
1, maxs02TS
dist(s, s0
)
0, otherwise .
SCHEMA DETERMINATION FEATURES
P(s|q, E) =
X
T
P(s|T)P(T|q, E)
Schema label likelihood
Table’s relevance to the query
P(T|q, E) / P(T|E)P(T|q)
SCHEMA DETERMINATION FEATURES
AR(s, E) =
1
|E|
X
e2E
match(s, e, T) + drel(d, e) + sh(s, e) + kb(s, e)
Similarity between en/ty e and
schema label s with respect to T
Relevance of the document
containing the table
#hits returned by a web search
engine to the query "[s] of [e]"
above threshold
Whether s is a property of e
in the KB
Kopliku et al. Towards a Framework for Abribute Retrieval. In: CIKM ’11.
VALUE LOOKUP
• A catalog of possible en/ty a@ribute-value pairs
• En/ty, schema label, value, provenance quadruples
he, s, v, pi
e s v T #123
values from KB
values from TC
s
e v
T #123
VALUE LOOKUP
• Finding a cell’s value is a lookup in that catalog
score(v, e, s, q) = max
hs0
,v,pi2eV
match(s,s0
)
conf (p, q)
eV
values from KB
values from TC
soy string matching
- "birthday" vs. "date of birth"
- "country" vs. "na/onality"
matching confidence
- KB takes priority over TC
- based on the corresponding table’s
relevance to the query
EXPERIMENTAL EVALUATION
EXPERIMENTAL SETUP
• Table corpus
• WikiTables corpus: 1.6M tables extracted from Wikipedia
• Knowledge base
• DBpedia (2015-10): 4.6M en//es with an English abstract
• Two query sets
• Rank-based metrics
• NDCG for core column en/ty ranking and schema determina/on
• MAP/MRR for value lookup
QUERY SET 1 (QS-1)
• List queries from the DBpedia-En/ty v2 collec/on1 (119)
• "all cars that are produced in Germany"
• "permanent members of the UN Security Council"
• "Airlines that currently use Boeing 747 planes"
• Core column en/ty ranking
• Highly relevant en//es from the collec/on
• Schema determina/on
• Crowdsourcing, 3-point relevance scale, 7k query-label pairs
• Value lookup
• Crowdsourcing, 25 queries sample, 14k cell values
1 Hasibi et al. DBpedia-En'ty v2: A Test Collec'on for En'ty Search. In: SIGIR ’17.
QUERY SET 2 (QS-2)
• En/ty-rela/onship queries from the RELink Query
Collec/on1 (600)
• Queries are answered by en/ty tuples (pairs or triplets)
• That is, each query is answered by a table with 2 or 3 columns (including the
core en/ty column)
• Queries and relevance judgments are obtained automa/cally from Wikipedia
lists that contain rela/onal tables
• Human annotators were asked to formulate the corresponding informa/on
need as a natural language query
• "find peaks above 6000m in the mountains of Peru"
• "Which countries and ciMes have accredited armenian embassadors?"
• "Which anM-aircra guns were used in ships during war periods and what country produced them?"
1 Saleiro et al. RELink: A Research Framework and Test Collec'on for En'ty-Rela'onship Retrieval. In: SIGIR ’17.
CORE COLUMN ENTITY RANKING
(QUERY-BASED)
QS-1 QS-2
NDCG@5 NDCG@10 NDCG@5 NDCG@10
LM 0.2419 0.2591 0.0708 0.0823
DRRM_TKS (ed) 0.2015 0.2028 0.0501 0.0540
DRRM_TKS (ep) 0.1780 0.1808 0.1089 0.1083
Combined 0.2821 0.2834 0.0852 0.0920
CORE COLUMN ENTITY RANKING
(SCHEMA-ASSISTED)
QS-1 QS-2
•R #0: without schema informa/on (query only)
•R #1-#3: with automa/c schema determina/on (top 10)
•Oracle: with ground truth schema
SCHEMA DETERMINATION
(QUERY-BASED)
QS-1 QS-2
NDCG@5 NDCG@10 NDCG@5 NDCG@10
CP 0.0561 0.0675 0.1770 0.2092
DRRM_TKS 0.0380 0.0427 0.0920 0.1415
Combined 0.0786 0.0878 0.2310 0.2695
SCHEMA DETERMINATION
(ENTITY-ASSISTED)
QS-1 QS-2
•R #0: without en/ty informa/on (query only)
•R #1-#3: with automa/c core column en/ty ranking (top 10)
•Oracle: with ground truth en//es
RESULTS
VALUE LOOKUP
QS-1 QS-2
MAP MRR MAP MRR
Knowledge base 0.7759 0.7990 0.0745 0.0745
Table corpus 0.1614 0.1746 0.9564 0.9564
Combined 0.9270 0.9427 0.9564 0.9564
EXAMPLE
"Towns in the Republic of Ireland in 2006 Census Records"
ANALYSIS
QS-1 QS-2
- -
QS-1 Round #0 vs. #1 43 38 38 52 7 60
Round #0 vs. #2 50 30 39 61 5 53
Round #0 vs. #3 49 26 44 59 2 58
QS-2 Round #0 vs. #1 166 82 346 386 56 158
Round #0 vs. #2 173 74 347 388 86 126
Round #0 vs. #3 173 72 349 403 103 94
" "# #
SUMMARY
• Answering queries with rela/onal tables,
summarizing en//es and their a@ributes
• Retrieving exis/ng tables from a table corpus
• Genera/ng a table on-the-fly
• Future work
• Moving from homogeneous Wikipedia tables to other types of
tables (scien/fic tables, Web tables)
• Value lookup with conflic/ng values; verifying cell values
• Result snippets for table search results
• ...
QUESTIONS?
@krisz'anbalog 

krisz/anbalog.com

More Related Content

What's hot

Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Andre Freitas
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphsandyseaborne
 
Federated data stores using semantic web technology
Federated data stores using semantic web technologyFederated data stores using semantic web technology
Federated data stores using semantic web technologySteve Ray
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDFNarni Rajesh
 
Verifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetVerifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetAlexandre Rademaker
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2Dimitris Kontokostas
 
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...Dimitris Kontokostas
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overviewAmit Sheth
 
Representing financial reports on the semantic web a faithful translation f...
Representing financial reports on the semantic web   a faithful translation f...Representing financial reports on the semantic web   a faithful translation f...
Representing financial reports on the semantic web a faithful translation f...Jie Bao
 
Expressive Query Answering For Semantic Wikis (20min)
Expressive Query Answering For  Semantic Wikis (20min)Expressive Query Answering For  Semantic Wikis (20min)
Expressive Query Answering For Semantic Wikis (20min)Jie Bao
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Amit Sheth
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwanandrea huang
 

What's hot (19)

Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphs
 
Federated data stores using semantic web technology
Federated data stores using semantic web technologyFederated data stores using semantic web technology
Federated data stores using semantic web technology
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
Verifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetVerifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNet
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Introduction to RDF Data Model
Introduction to RDF Data ModelIntroduction to RDF Data Model
Introduction to RDF Data Model
 
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
 
RDF data model
RDF data modelRDF data model
RDF data model
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
RDF and OWL
RDF and OWLRDF and OWL
RDF and OWL
 
Representing financial reports on the semantic web a faithful translation f...
Representing financial reports on the semantic web   a faithful translation f...Representing financial reports on the semantic web   a faithful translation f...
Representing financial reports on the semantic web a faithful translation f...
 
Expressive Query Answering For Semantic Wikis (20min)
Expressive Query Answering For  Semantic Wikis (20min)Expressive Query Answering For  Semantic Wikis (20min)
Expressive Query Answering For Semantic Wikis (20min)
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
 

Similar to Table Retrieval and Generation

Data_base.pptx
Data_base.pptxData_base.pptx
Data_base.pptxMohit89650
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligencekrisztianbalog
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesElsevier
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsPeter Haase
 
CiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big DataCiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big DataJian Wu
 
Toward Description Generation for Tables in Scientific Articles
Toward Description Generation for Tables in Scientific ArticlesToward Description Generation for Tables in Scientific Articles
Toward Description Generation for Tables in Scientific ArticlesJUNJIEXu9
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.C. Tobin Magle
 
Kampmeier ecn 2012
Kampmeier ecn 2012Kampmeier ecn 2012
Kampmeier ecn 2012ECNOfficer
 
1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.pptAshok280385
 
Towards Efficient and Effective Semantic Table Interpretation
Towards Efficient and Effective Semantic Table InterpretationTowards Efficient and Effective Semantic Table Interpretation
Towards Efficient and Effective Semantic Table InterpretationZiqi Zhang
 
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...Thanh Tran
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisStuart Wrigley
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 

Similar to Table Retrieval and Generation (20)

Data_base.pptx
Data_base.pptxData_base.pptx
Data_base.pptx
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
CiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big DataCiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big Data
 
Toward Description Generation for Tables in Scientific Articles
Toward Description Generation for Tables in Scientific ArticlesToward Description Generation for Tables in Scientific Articles
Toward Description Generation for Tables in Scientific Articles
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.
 
Kampmeier ecn 2012
Kampmeier ecn 2012Kampmeier ecn 2012
Kampmeier ecn 2012
 
demo2.ppt
demo2.pptdemo2.ppt
demo2.ppt
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt
 
RDBMS Model
RDBMS ModelRDBMS Model
RDBMS Model
 
Towards Efficient and Effective Semantic Table Interpretation
Towards Efficient and Effective Semantic Table InterpretationTowards Efficient and Effective Semantic Table Interpretation
Towards Efficient and Effective Semantic Table Interpretation
 
Database intro
Database introDatabase intro
Database intro
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Symbol Table
Symbol TableSymbol Table
Symbol Table
 
Database_Introduction.pdf
Database_Introduction.pdfDatabase_Introduction.pdf
Database_Introduction.pdf
 

More from krisztianbalog

Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...krisztianbalog
 
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...krisztianbalog
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?krisztianbalog
 
Personal Knowledge Graphs
Personal Knowledge GraphsPersonal Knowledge Graphs
Personal Knowledge Graphskrisztianbalog
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Editionkrisztianbalog
 
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF LabOverview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Labkrisztianbalog
 
Time-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systemskrisztianbalog
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendationkrisztianbalog
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seachkrisztianbalog
 
Collection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Searchkrisztianbalog
 

More from krisztianbalog (10)

Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
 
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
 
Personal Knowledge Graphs
Personal Knowledge GraphsPersonal Knowledge Graphs
Personal Knowledge Graphs
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Edition
 
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF LabOverview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
 
Time-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systems
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seach
 
Collection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Search
 

Recently uploaded

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 

Recently uploaded (20)

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 

Table Retrieval and Generation

  • 1. Table Retrieval and Genera/on Krisz/an Balog University of Stavanger
 @krisz'anbalog Informa/on Access 
 & Interac/on h@p://iai.group SIGIR’18 workshop on Data Search (DATA:SEARCH’18) | Ann Arbor, Michigan, USA, July 2018
  • 2. JOIN WORK WITH Shuo Zhang
 @imsure318
  • 6. IN THIS TALK • Three retrieval tasks, with tables as results • Ad hoc table retrieval • Query-by-table • On-the-fly table genera/on
  • 7. THE ANATOMY OF A RELATIONAL 
 (ENTITY-FOCUSED) TABLE Formula 1 constructors’ statistics 2016 Constructor Ferrari Engine Country Base Force India Haas Ferrari Mercedes Ferrari Italy India US Italy UK US & UK Manor Mercedes UK UK … …
  • 8. Formula 1 constructors’ statistics 2016 Constructor Ferrari Engine Country Base Force India Haas Ferrari Mercedes Ferrari Italy India US Italy UK US & UK Manor Mercedes UK UK … … Table cap/on THE ANATOMY OF A RELATIONAL 
 (ENTITY-FOCUSED) TABLE
  • 9. Formula 1 constructors’ statistics 2016 Constructor Ferrari Engine Country Base Force India Haas Ferrari Mercedes Ferrari Italy India US Italy UK US & UK Manor Mercedes UK UK … … Core column (subject column) THE ANATOMY OF A RELATIONAL 
 (ENTITY-FOCUSED) TABLE We assume that these en//es are recognized and disambiguated, i.e., linked to a knowledge base
  • 10. Heading column labels (table schema) Formula 1 constructors’ statistics 2016 Constructor Ferrari Engine Country Base Force India Haas Ferrari Mercedes Ferrari Italy India US Italy UK US & UK Manor Mercedes UK UK … … THE ANATOMY OF A RELATIONAL 
 (ENTITY-FOCUSED) TABLE
  • 11. AD HOC TABLE RETRIEVAL S. Zhang and K. Balog. Ad Hoc Table Retrieval using Seman'c Similarity. 
 In: The Web Conference 2018 (WWW '18)
  • 12. TASK • Ad hoc table retrieval: • Given a keyword query as input, return a ranked list of tables from a table corpus Singapore Search Year GDP Nominal (Billion) GDP Nominal Per Capita GDP Real (Billion) Singapore - Wikipedia, Economy Statistics (Recent Years) GNI Nominal (Billion) GNI Nominal Per Capita 2011 S$346.353 S$66,816 S$342.371 S$338.452 S$65,292 https://en.wikipedia.org/wiki/Singapore Show more (5 rows total) Singapore - Wikipedia, Language used most frequently at home Language Color in Figure Percent English Blue 36.9% Show more (6 rows total) https://en.wikipedia.org/wiki/Singapore 2012 S$362.332 S$68,205 S$354.061 S$351.765 S$66,216 2013 S$378.200 S$70,047 S$324.592 S$366.618 S$67,902 Mandarin Yellow 34.9% Malay Red 10.7%
  • 13. APPROACHES • Unsupervised methods • Build a document-based representa/on for each table, then employ conven/onal document retrieval methods • Supervised methods • Describe query-table pairs using a set of features, then employ supervised machine learning ("learning-to-rank") • Contribu'on #1: new state-of-the-art, using a rich set of features • Contribu'on #2: new set of seman/c matching features
  • 14. UNSUPERVISED METHODS • Single-field document representa/on • All table content, no structure • Mul/-field document representa/on • Separate document fields for embedding document’s /tle, sec/on /tle, table cap/on, table body, and table headings
  • 15. SUPERVISED METHODS • Three groups of features • Query features • #query terms, query IDF scores • Table features • Table proper/es: #rows, #cols, #empty cells, etc. • Embedding document: link structure, number of tables, etc. • Query-table features • Query terms found in different table elements, LM score, etc. • Our novel seman/c matching features
  • 17. SEMANTIC MATCHING • Main objec/ve: go beyond term-based matching • Three components: 1. Content extrac/on 2. Seman/c representa/ons 3. Similarity measures
  • 18. SEMANTIC MATCHING 1. CONTENT EXTRACTION • The “raw” content of a query/table is represented as a set of terms, which can be words or en//es Query … q1 qn … Table t1 tm
  • 19. SEMANTIC MATCHING 1. CONTENT EXTRACTION • The “raw” content of a query/table is represented as a set of terms, which can be words or en//es Query … q1 qn … Table t1 tm Entity-based: - Top-k ranked entities from a knowledge base - Entities in the core table column - Top-k ranked entities using the embedding document/section title as a query
  • 20. SEMANTIC MATCHING 2. SEMANTIC REPRESENTATIONS • Each of the raw terms is mapped to a seman/c vector representa/on Query …… Table q1 qn t1 tm ~t1 ~tm … ~q1 ~qn …
  • 21. SEMANTIC REPRESENTATIONS • Bag-of-concepts (sparse discrete vectors) • Bag-of-en''es • Each vector element corresponds to an en/ty • is 1 if there exists a link between en//es i and j in the KB • Bag-of-categories • Each vector element corresponds to a Wikipedia category • is 1 if en/ty i is assigned to Wikipedia category j • Embeddings (dense con/nuous vectors) • Word embeddings • Word2Vec (300 dimensions, trained on Google news) • Graph embeddings • RDF2vec (200 dimensions, trained on DBpedia) ~ti ~ti[j] ~ti[j] j
  • 22. SEMANTIC MATCHING 3. SIMILARITY MEASURES Query …… Table q1 qn t1 tm ~t1 ~tm … ~q1 ~qn … semantic matching
  • 23. SEMANTIC MATCHING EARLY FUSION MATCHING STRATEGY Query …… Table q1 qn t1 tm ~t1 ~tm … ~q1 ~qn … semantic matching Early: Take the centroid of semantic vectors and compute their cosine similarity ~t1 ~tm … ~q1 ~qn …
  • 24. SEMANTIC MATCHING LATE FUSION MATCHING STRATEGY Query …… Table q1 qn t1 tm ~t1 ~tm … ~q1 ~qn … semantic matching Late: Compute all pairwise similarities between the query and table semantic vectors, then aggregate those pairwise similarity scores (sum, avg, or max) ~t1 ~tm … ~q1 ~qn … AGGR … …
  • 26. EXPERIMENTAL SETUP • Table corpus • WikiTables corpus1: 1.6M tables extracted from Wikipedia • Knowledge base • DBpedia (2015-10): 4.6M en//es with an English abstract • Queries • Sampled from two sources2,3 • Rank-based evalua/on • NDCG@5, 10, 15, 20 1 Bhagavatula et al. TabEL: En'ty Linking in Web Tables. In: ISWC ’15. 2 Cafarella et al. Data Integra'on for the Rela'onal Web. Proc. of VLDB Endow. (2009) 3 Vene/s et al. Recovering Seman'cs of Tables on the Web. Proc. of VLDB Endow. (2011) QS-1 QS-2 video games asian countries currency us ci/es laptops cpu kings of africa food calories economy gdp guitars manufacturer
  • 27. RELEVANCE ASSESSMENTS • Collected via crowdsourcing • Pooling to depth 20, 3120 query-table pairs in total • Assessors are presented with the following scenario • "Imagine that your task is to create a new table on the query topic" • A table is … • Non-relevant (0): if it is unclear what it is about or it about a different topic • Relevant (1): if some cells or values could be used from it • Highly relevant (2): if large blocks or several values could be used from it
  • 28. RESEARCH QUESTIONS • RQ1: Can seman/c matching improve retrieval performance? • RQ2: Which of the seman/c representa/ons is the most effec/ve? • RQ3: Which of the similarity measures performs best?
  • 29. RESULTS: RQ1 NDCG@10 NDCG@20 Single-field document ranking 0.4344 0.5254 Mul'-field document ranking 0.4860 0.5473 WebTable1 0.2992 0.3726 WikiTable2 0.4766 0.5206 LTR baseline 0.5456 0.6031 STR (LTR + seman'c matching) 0.6293 0.6825 1 Cafarella et al. WebTables: Exploring the Power of Tables on the Web. Proc. of VLDB Endow. (2008) 2 Bhagavatula et al. Methods for Exploring and Mining Tables on Wikipedia. In: IDEA ’13. • Can seman/c matching improve retrieval performance? • Yes. STR achieves substan/al and significants improvements over LTR.
  • 30. RESULTS: RQ2 • Which of the seman/c representa/ons is the most effec/ve? • Bag-of-en//es.
  • 31. RESULTS: RQ3 • Which of the similarity measures performs best? • Late-sum and Late-avg (but it also depends on the representa/on)
  • 34. ON-THE-FLY TABLE GENERATION S. Zhang and K. Balog. On-the-fly Table Genera'on. 
 In: 41st InternaMonal ACM SIGIR Conference on Research and Development in InformaMon Retrieval (SIGIR '18)
  • 35. TASK • On-the-fly table generaMon: • Answer a free text query with a rela/onal table, where • the core column lists all relevant en//es; • columns correspond to a@ributes of those en//es; • cells contain the values of the corresponding en/ty a@ributes. Video albums of Taylor Swift Search Title Released data Label CMT Crossroads: Taylor Swift and … Formats Journey to Fearless Speak Now World Tour-Live The 1989 World Tour Live Jun 16, 2009 Oct 11, 2011 Nov 21, 2011 Dec 20, 2015 Big Machine Shout! Factory Big Machine Big Machine DVD Blu-ray, DVD CD/Blu-ray, … Streaming E V S
  • 36. APPROACH Core column en/ty ranking and schema determina/on could poten/ally mutually reinforce each other. Query (q) E Core column en+ty ranking Schema determina+on S Value lookup V E S
  • 38. KNOWLEDGE BASE ENTRY ed ep Property: value Entity name Entity type Description Property: value … ea
  • 39. CORE COLUMN ENTITY RANKING scoret(e, q) = X i wi i(e, q, St 1 )
  • 40. CORE COLUMN ENTITY RANKING FEATURES En/ty’s relevance to the query computed using language modeling
  • 41. CORE COLUMN ENTITY RANKING FEATURES Query Entity Matching Degree Matching matrix Top-k entries Dense layer Hidden layers Output layer … (n ⇥ m) s is the concatena/on of all schema labels in S is the string concatena/on operator
  • 42. CORE COLUMN ENTITY RANKING FEATURES Compa/bility matrix: ei sj Cij = ( 1, if matchKB(ei, sj) _ matchT C(ei, sj) 0, otherwise . ESC(S, ei) = 1 |S| X j Cij En/ty-schema compa/bility score:
  • 43. SCHEMA DETERMINATION scoret(s, q) = X i wi i(s, q, Et 1 )
  • 44. SCHEMA DETERMINATION FEATURES P(s|q) = X T 2T P(s|T)P(T|q) Schema label likelihood Table’s relevance to the query P(s|T) = ( 1, maxs02TS dist(s, s0 ) 0, otherwise .
  • 45. SCHEMA DETERMINATION FEATURES P(s|q, E) = X T P(s|T)P(T|q, E) Schema label likelihood Table’s relevance to the query P(T|q, E) / P(T|E)P(T|q)
  • 46. SCHEMA DETERMINATION FEATURES AR(s, E) = 1 |E| X e2E match(s, e, T) + drel(d, e) + sh(s, e) + kb(s, e) Similarity between en/ty e and schema label s with respect to T Relevance of the document containing the table #hits returned by a web search engine to the query "[s] of [e]" above threshold Whether s is a property of e in the KB Kopliku et al. Towards a Framework for Abribute Retrieval. In: CIKM ’11.
  • 47. VALUE LOOKUP • A catalog of possible en/ty a@ribute-value pairs • En/ty, schema label, value, provenance quadruples he, s, v, pi e s v T #123 values from KB values from TC s e v T #123
  • 48. VALUE LOOKUP • Finding a cell’s value is a lookup in that catalog score(v, e, s, q) = max hs0 ,v,pi2eV match(s,s0 ) conf (p, q) eV values from KB values from TC soy string matching - "birthday" vs. "date of birth" - "country" vs. "na/onality" matching confidence - KB takes priority over TC - based on the corresponding table’s relevance to the query
  • 50. EXPERIMENTAL SETUP • Table corpus • WikiTables corpus: 1.6M tables extracted from Wikipedia • Knowledge base • DBpedia (2015-10): 4.6M en//es with an English abstract • Two query sets • Rank-based metrics • NDCG for core column en/ty ranking and schema determina/on • MAP/MRR for value lookup
  • 51. QUERY SET 1 (QS-1) • List queries from the DBpedia-En/ty v2 collec/on1 (119) • "all cars that are produced in Germany" • "permanent members of the UN Security Council" • "Airlines that currently use Boeing 747 planes" • Core column en/ty ranking • Highly relevant en//es from the collec/on • Schema determina/on • Crowdsourcing, 3-point relevance scale, 7k query-label pairs • Value lookup • Crowdsourcing, 25 queries sample, 14k cell values 1 Hasibi et al. DBpedia-En'ty v2: A Test Collec'on for En'ty Search. In: SIGIR ’17.
  • 52. QUERY SET 2 (QS-2) • En/ty-rela/onship queries from the RELink Query Collec/on1 (600) • Queries are answered by en/ty tuples (pairs or triplets) • That is, each query is answered by a table with 2 or 3 columns (including the core en/ty column) • Queries and relevance judgments are obtained automa/cally from Wikipedia lists that contain rela/onal tables • Human annotators were asked to formulate the corresponding informa/on need as a natural language query • "find peaks above 6000m in the mountains of Peru" • "Which countries and ciMes have accredited armenian embassadors?" • "Which anM-aircra guns were used in ships during war periods and what country produced them?" 1 Saleiro et al. RELink: A Research Framework and Test Collec'on for En'ty-Rela'onship Retrieval. In: SIGIR ’17.
  • 53. CORE COLUMN ENTITY RANKING (QUERY-BASED) QS-1 QS-2 NDCG@5 NDCG@10 NDCG@5 NDCG@10 LM 0.2419 0.2591 0.0708 0.0823 DRRM_TKS (ed) 0.2015 0.2028 0.0501 0.0540 DRRM_TKS (ep) 0.1780 0.1808 0.1089 0.1083 Combined 0.2821 0.2834 0.0852 0.0920
  • 54. CORE COLUMN ENTITY RANKING (SCHEMA-ASSISTED) QS-1 QS-2 •R #0: without schema informa/on (query only) •R #1-#3: with automa/c schema determina/on (top 10) •Oracle: with ground truth schema
  • 55. SCHEMA DETERMINATION (QUERY-BASED) QS-1 QS-2 NDCG@5 NDCG@10 NDCG@5 NDCG@10 CP 0.0561 0.0675 0.1770 0.2092 DRRM_TKS 0.0380 0.0427 0.0920 0.1415 Combined 0.0786 0.0878 0.2310 0.2695
  • 56. SCHEMA DETERMINATION (ENTITY-ASSISTED) QS-1 QS-2 •R #0: without en/ty informa/on (query only) •R #1-#3: with automa/c core column en/ty ranking (top 10) •Oracle: with ground truth en//es
  • 57. RESULTS VALUE LOOKUP QS-1 QS-2 MAP MRR MAP MRR Knowledge base 0.7759 0.7990 0.0745 0.0745 Table corpus 0.1614 0.1746 0.9564 0.9564 Combined 0.9270 0.9427 0.9564 0.9564
  • 58. EXAMPLE "Towns in the Republic of Ireland in 2006 Census Records"
  • 59. ANALYSIS QS-1 QS-2 - - QS-1 Round #0 vs. #1 43 38 38 52 7 60 Round #0 vs. #2 50 30 39 61 5 53 Round #0 vs. #3 49 26 44 59 2 58 QS-2 Round #0 vs. #1 166 82 346 386 56 158 Round #0 vs. #2 173 74 347 388 86 126 Round #0 vs. #3 173 72 349 403 103 94 " "# #
  • 60. SUMMARY • Answering queries with rela/onal tables, summarizing en//es and their a@ributes • Retrieving exis/ng tables from a table corpus • Genera/ng a table on-the-fly • Future work • Moving from homogeneous Wikipedia tables to other types of tables (scien/fic tables, Web tables) • Value lookup with conflic/ng values; verifying cell values • Result snippets for table search results • ...