SlideShare a Scribd company logo
1 of 68
The Relevance of the Apache Solr
Semantic Knowledge Graph
Trey Grainger
SVP of Engineering, Lucidworks
2017.04.11
Trey Grainger
SVP of Engineering
• Previously Director of Engineering @ CareerBuilder
• MBA, Management of Technology – Georgia Tech
• BA, Computer Science, Business, & Philosophy – Furman University
• Information Retrieval & Web Search - Stanford University
Other fun projects:
• Co-author of Solr in Action, plus numerous research papers
• Frequent conference speaker
• Founder of Celiaccess.com, the gluten-free search engine
• Lucene/Solr contributor
• Startup Investor / Advisor
About Me
• “A compact, auto-generated model for real-time traversal and
ranking of any relationship within a domain”
• A multi-dimensional term-to-term (vs. term-to-document) search
engine
• A tool which enables knowledge modeling and reasoning, natural
language processing, anomaly detection, data cleansing, semantic
search, analytics, data classification, root cause analysis, and
recommendations systems
• It’s kind of like Word2Vec, but better suited for interpreting the
nuanced intent of typical search queries
What is the Semantic Knowledge Graph?
• Introduction
• Overview of Semantic Knowledge Graph
- Index Structure
- Graph Traversal Mechanics
- Edge Weights/Relevancy Scoring
• Use Cases
- Document Summarization / Content-based Recommendations
- Predictive Analytics
- Data Cleansing
- Document Classification & Enrichment
- Query Disambiguation
- Semantic Search / Concept Expansion
• Installation
• Live Demos
• Q&A
Agenda
Based in San Francisco, offices
and employees worldwide
Fusion, the platform for building
data-driven, smart apps
Consulting and support for
organizations using Solr
Produces the world’s largest open
source user conference dedicated
to Lucene/Solr
Lucidworks is the primary commercial
contributor to the Apache Solr project
Employs over 40% of the active
committers on the Solr project
Contributes over 70% of Solr's
open source codebase
40%
70%
We are a global company
Our 8 offices serve over 400 customers.
We solve big problems
for the world’s brightest businesses.
Fusion powers search for the brightest companies in the world.
Knowledge
Graph
Knowledge
Graph
Why do we care?
User’s Query:
machine learning research and development Portland, OR software
engineer AND hadoop, java
Traditional Query Parsing:
(machine AND learning AND research AND development AND portland)
OR (software AND engineer AND hadoop AND java)
Semantic Query Parsing:
"machine learning" AND "research and development" AND "Portland, OR"
AND "software engineer" AND hadoop AND java
Semantically Expanded Query:
("machine learning"^10 OR "data scientist" OR "data mining" OR "artificial intelligence")
AND ("research and development"^10 OR "r&d") AND
AND ("Portland, OR"^10 OR "Portland, Oregon" OR {!geofilt pt=45.512,-122.676 d=50 sfield=geo})
AND ("software engineer"^10 OR "software developer")
AND (hadoop^10 OR "big data" OR hbase OR hive) AND (java^10 OR j2ee)
Terminology / Background
A Graph
DSAA 2016
Montreal
Quebec Canada
Semantic
Knowledge
Graph Paper
Trey
Grainger
Mohammed
Korayem
Andries
Smith
Khalifeh
AlJadda
in_country
Node / Vertex
Edge
Term Documents
a doc1 [2x]
brown doc3 [1x] , doc5 [1x]
cat doc4 [1x]
cow doc2 [1x] , doc5 [1x]
… ...
once doc1 [1x], doc5 [1x]
over doc2 [1x], doc3 [1x]
the doc2 [2x], doc3 [2x],
doc4[2x], doc5 [1x]
… …
Document Content Field
doc1 once upon a time, in a land far,
far away
doc2 the cow jumped over the moon.
doc3 the quick brown fox jumped over
the lazy dog.
doc4 the cat in the hat
doc5 The brown cow said “moo”
once.
… …
What you SEND to Lucene/Solr:
How the content is INDEXED into
Lucene/Solr (conceptually):
The inverted index
/solr/collection/select/?q=apache solr
Term Documents
… …
apache
doc1, doc3, doc4,
doc5
…
hadoop doc2, doc4, doc6
… …
solr
doc1, doc3, doc4,
doc7, doc8
… …
doc5
doc7 doc8
doc1 doc3
doc4
solr
apache
apache solr
Matching queries to documents
BM25 (Okapi “Best Match” 25th Iteration)
Score(q, d) =
∑ idf(t) · ( tf(t in d) · (k + 1) ) / ( tf(t in d) + k · (1 – b + b · |d| / avgdl )
t in q
Where:
t = term; d = document; q = query; i = index
tf(t in d) = numTermOccurrencesInDocument ½
idf(t) = 1 + log (numDocs / (docFreq + 1))
|d| = ∑ 1
t in d
avgdl = = ( ∑ |d| ) / ( ∑ 1 ) )
d in i d in i
k = Free parameter. Usually ~1.2 to 2.0. Increases term frequency saturation point.
b = Free parameter. Usually ~0.75. Increases impact of document normalization.
The Idea
Knowledge
Graph
Challenges of building a traditional knowledge graph
Because current knowledge bases / ontology learning systems typically
requires explicitly modeling nodes and edges into a graph ahead of time, this
unfortunately presents several limitations to the use of such a knowledge graph:
• Entities not modeled explicitly as nodes have no known relationships to any other
entities.
• Edges exist between nodes, but not between arbitrary combinations of nodes, and therefore
such a graph is not ideal for representing nuanced meanings of an entity when appearing
within different contexts, as is common within natural language.
• Substantial meaning is encoded in the linguistic representation of the domain that is
lost when the underlying textual representation is not preserved: phrases, interaction of
concepts through actions (i.e. verbs), positional ordering of entities and the phrases containing
those entities, variations in spelling and other representations of entities, the use of adjectives
to modify entities to represent more complex concepts, and aggregate frequencies of
occurrence for different representations of entities relative to other representations.
• It can be an arduous process to create robust ontologies, map a domain into a graph
representing those ontologies, and ensure the generated graph is compact, accurate,
comprehensive, and kept up to date.
01
Knowledge
Graph
Semantic Data Encoded into Free Text Content
e en eng engi engineer engineers
engineer engineersNode Type: Term
software
engineer
software
engineers
electrical
engineering
engineer
engineering software
…
…
…
Node Type:
Character Sequence
Node Type:
Term Sequence
Node Type:
Document
id: 1
text: looking for a software
engineerwith degree in
computer science or
electrical engineering
id: 2
text: apply to be a software
engineer and work with
other great software
engineers
id: 3
text: start a great careerin
electrical engineering
…
…
Serves as a “data science toolkit” API that allows dynamically navigating and pivoting through multiple
levels of relationships between items in a domain.
Semantic Knowledge Graph API
Core similarity engine, exposed via API
Any product can leverage the core relationship scoring
engine to score any list of entities against any other list
Full domain support
Keywords, categories, tags, based upon any field on your
documents. Graph is build automatically from the
content representing your domain.
Intersections, overlaps, & relationship
scoring, many levels deep
Users can either provide a list of items to score, or else
have the system dynamically discover the most related
items (or both).
Knowledge
Graph
Graph Structure
id: 1
job_title: Software Engineer
desc: software engineer at a
great company
skills: .Net, C#, java
id: 2
job_title: Registered Nurse
desc: a registered nurse at
hospital doing hard work
skills: oncology, phlebotemy
id: 3
job_title: Java Developer
desc: a software engineer or a
java engineer doing work
skills: java, scala, hibernate
field doc term
desc
1
a
at
company
engineer
great
software
2
a
at
doing
hard
hospital
nurse
registered
work
3
a
doing
engineer
java
or
software
work
job_title 1
Software
Engineer
… … …
Terms-Docs Inverted IndexDocs-Terms Forward IndexDocuments
Source: Trey Grainger,
Khalifeh AlJadda, Mohammed
Korayem, Andries Smith.“The
Semantic Knowledge Graph: A
compact, auto-generated
model for real-time traversal
and ranking of any relationship
within a domain”. DSAA 2016.
Knowledge
Graph
field term postings list
doc pos
desc
a
1 4
2 1
3 1, 5
at
1 3
2 4
company 1 6
doing
2 6
3 8
engineer
1 2
3 3, 7
great 1 5
hard 2 7
hospital 2 5
java 3 6
nurse 2 3
or 3 4
registered 2 2
software
1 1
3 2
work
2 10
3 9
job_title java developer 3 1
… … … …
Source: Trey Grainger,
Khalifeh AlJadda, Mohammed
Korayem, Andries Smith.“The
Semantic Knowledge Graph: A
compact, auto-generated
model for real-time traversal
and ranking of any relationship
within a domain”. DSAA 2016.
Knowledge
Graph
Set-theory View
Graph View
How the Graph Traversal Works
skill: Java
skill: Scala
skill:
Hibernate
skill:
Oncology
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
skill:
Java
skill: Java
skill: Scala
skill:
Hibernate
skill:
Oncology
Data Structure View
Java
Scala Hibernate
docs
1, 2, 6
docs
3, 4
Oncology
doc 5
Source: Trey Grainger,
Khalifeh AlJadda, Mohammed
Korayem, Andries Smith.“The
Semantic Knowledge Graph: A
compact, auto-generated
model for real-time traversal
and ranking of any relationship
within a domain”. DSAA 2016.
Knowledge
Graph
Multi-level Traversal
Data Structure View
Graph View
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
skill:
Java
skill: Java
skill: Scala
skill:
Hibernate
skill:
Oncology
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
job_title:
Software
Engineer
job_title:
Data
Scientist
job_title:
Java
Developer
……
Inverted Index
Lookup
Forward Index
Lookup
Forward Index
Lookup
Inverted Index
Lookup
Java
Java
Developer
Hibernate
Scala
Software
Engineer
Data
Scientist
has_related_job_title
has_related_job_title
Knowledge
Graph
Materialization of new nodes through shared documents
engineer
engineers
software engineer*
(materialized node)
engineer*
(materialized node)
Software
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
links_to
links_to
Traversal & Scoring
Knowledge
Graph
From the academic paper:
Structure:
Single-level Traversal / Scoring:
Multi-level Traversal / Scoring:
Scoring of Node Relationships (Edge Weights)
Foreground vs. Background Analysis
Every term scored against it’s context. The more
commonly the term appears within it’s foreground
context versus its background context, the more
relevant it is to the specified foreground context.
countFG(x) - totalDocsFG * probBG(x)
z = --------------------------------------------------------
sqrt(totalDocsFG * probBG(x) * (1 - probBG(x)))
{ "type":"keywords”, "values":[
{ "value":"hive", "relatedness":0.9773, "popularity":369 },
{ "value":"java", "relatedness":0.9236, "popularity":15653 },
{ "value":".net", "relatedness":0.5294, "popularity":17683 },
{ "value":"bee", "relatedness":0.0, "popularity":0 },
{ "value":"teacher", "relatedness":-0.2380, "popularity":9923 },
{ "value":"registered nurse", "relatedness": -0.3802 "popularity":27089 } ] }
We are essentially boosting terms which are more related to some known feature
(and ignoring terms which are equally likely to appear in the background corpus)
+
-
Foreground Query:
"Hadoop"
Knowledge
Graph
Source: Trey Grainger,
Khalifeh AlJadda, Mohammed
Korayem, Andries Smith.“The
Semantic Knowledge Graph: A
compact, auto-generated
model for real-time traversal
and ranking of any relationship
within a domain”. DSAA 2016.
Knowledge
Graph
Multi-level Graph Traversal with Scores
software engineer*
(materialized node)
Java
C#
.NET
.NET
Developer
Java
Developer
Hibernate
ScalaVB.NET
Software
Engineer
Data
Scientist
Skill
Nodes
has_related_skillStarting
Node
Skill
Nodes
has_related_skill Job Title
Nodes
has_related_job_title
0.90
0.88 0.93
0.93
0.34
0.74
0.91
0.89
0.74
0.89
0.780.72
0.48
0.93
0.76
0.83
0.80
0.64
0.61
0.780.55
Use Cases
Knowledge
Graph
Use Case: Document Summarization /
Content-based Recommendations
Experiment: Pass in raw text
(extracting phrases as needed), and
rank their similarity to the documents
using the SKG.
Additionally, can traverse the graph
to “related” entities/keyword phrases
NOT found in the original document
Applications: Content-based and
multi-modal recommendations
(no cold-start problem), data cleansing
prior to clustering or other ML methods,
semantic search / similarity scoring
Knowledge
Graph
Use Case: Predictive Analytics
Knowledge
Graph
Use Case: Data Cleansing
{ "type":"keywords”, "values":[
{ "value":"hive", "relatedness": 0.9765, "popularity":369 },
{ "value":”spark", "relatedness": 0.9634, "popularity":15653 },
{ "value":".net", "relatedness": 0.5417, "popularity":17683 },
{ "value":"bogus_word", "relatedness": 0.0, "popularity":0 },
{ "value":"teaching", "relatedness": -0.1510, "popularity":9923 },
{ "value":"CPR", "relatedness": -0.4012, "popularity":27089 } ] }
Foreground Query: "Hadoop"
Experiment: Data analyst
manually annotated 500
pairs of terms found together
in real query logs as
“relevant” or “not relevant”
Results: SKG removed 78%
of the terms while maintaining
a 95% accuracy at removing
the correct noisy pairs from
the input data.
Use Case: Document Classification & Enrichment Knowledge
Graph
Use Case: Query Disambiguation
Example Related Keywords (representing multiple meanings)
driver truck driver, linux, windows, courier, embedded, cdl,
delivery
architect autocad drafter, designer, enterprise architect, java
architect, designer, architectural designer, data architect,
oracle, java, architectural drafter, autocad, drafter, cad,
engineer
… …
Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
A few methodologies:
1) Query Log Mining
2) Semantic Knowledge Graph
Knowledge Graph
Query Log Mining: Discovering ambiguous phrases
1) Classify users who ran each
search in the search logs
(i.e. by the job title
classifications of the jobs to
which they applied)
3) Segment the search term => related search terms list by classification,
to return a separate related terms list per classification
2) Create a probabilistic graphical model of those classifications mapped
to each keyword phrase.
Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
Semantic Knowledge Graph: Discovering ambiguous phrases
1) Exact same concept, but use
a document classification
field (i.e. category) as the first
level of your graph, and the
related terms as the second
level to which you traverse.
2) Has the benefit that you don’t need query logs to mine, but it will be representative
of your data, as opposed to your user’s intent, so the quality depends on how clean
and representative your documents are.
Additional Benefit: Multi-dimensional disambiguation and dynamic materialization of
categories. Effectively an dynamically-materialized probabilistic graphical model
Disambiguated meanings (represented as term vectors)
Example Related Keywords (Disambiguated Meanings)
architect 1: enterprise architect, java architect, data architect, oracle, java, .net
2: architectural designer, architectural drafter, autocad, autocad drafter, designer,
drafter, cad, engineer
driver 1: linux, windows, embedded
2: truck driver, cdl driver, delivery driver, class b driver, cdl, courier
designer 1: design, print, animation, artist, illustrator, creative, graphic artist, graphic,
photoshop, video
2: graphic, web designer, design, web design, graphic design, graphic designer
3: design, drafter, cad designer, draftsman, autocad, mechanical designer, proe,
structural designer, revit
… …
Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
Using the disambiguated meanings
In a situation where a user searches for an ambiguous phrase, what information can we
use to pick the correct underlying meaning?
1. Any pre-existing knowledge about the user:
• User is a software engineer
• User has previously run searches for “c++” and “linux”
2. Context within the query:
User searched for windows AND driver vs. courier OR driver
3. If all else fails (and there is no context), use the most commonly occurring meaning.
driver 1: linux, windows, embedded
2: truck driver, cdl driver, delivery driver, class b driver, cdl, courier
Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
Knowledge
Graph
Use Case: Query Expansion
Experiment: Take an initial query, and expand keyword
phrases to include the most related entities to that query
Example:
The Semantic Search Problem
User’s Query:
machine learning research and development Portland, OR software
engineer AND hadoop, java
Traditional Query Parsing:
(machine AND learning AND research AND development AND portland)
OR (software AND engineer AND hadoop AND java)
Semantic Query Parsing:
"machine learning" AND "research and development" AND "Portland, OR"
AND "software engineer" AND hadoop AND java
Semantically Expanded Query:
("machine learning"^10 OR "data scientist" OR "data mining" OR "artificial intelligence")
AND ("research and development"^10 OR "r&d") AND
AND ("Portland, OR"^10 OR "Portland, Oregon" OR {!geofilt pt=45.512,-122.676 d=50 sfield=geo})
AND ("software engineer"^10 OR "software developer")
AND (hadoop^10 OR "big data" OR hbase OR hive) AND (java^10 OR j2ee)
Getting Started
Open Sourced!
#1: Pull, Build, Start Solr
cd ~/demo
git clone https://github.com/apache/lucene-solr.git
git checkout releases/lucene-solr/7.2.1
cd lucene-solr/solr
ant server
bin/solr -c –Denable.runtime.lib=true
#2: Pull and Build Semantic Knowledge Graph
cd ~/demo
git clone https://github.com/treygrainger/semantic-knowledge-graph.git
git checkout solr_7.2.1
mvn package
#3: Install Plugin
curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @semantic-
knowledge-graph-1.0-SNAPSHOT.jar http://localhost:8983/solr/.system/blob/skg.jar
#4: Create Collection
curl http://localhost:8983/solr/admin/collections?action=CREATE&name=demo-collection
#5: Register Plugin with Collection
curl http://localhost:8983/solr/demo-collection/config
-H 'Content-type:application/json' -d
'{
"add-runtimelib": { "name":"skg.jar", "version":1 } }'
#6: Register Response Writer
curl http://localhost:8983/solr/demo-collection/config
-H 'Content-type:application/json' -d
'{ "add-queryresponsewriter": {
"name": "skg", "runtimeLib": true, "class":
”org.apache.solr.search.skg.responsewriter.KnowledgeGraphResponseWriter"} }’
#7: Register Request Handler
curl http://localhost:8983/solr/demo-collection/config
-H 'Content-type:application/json' -d
'{ "add-requesthandler" : { "name": "/skg",
"class":”org.apache.solr.search.skg.KnowledgeGraphHandler",
"defaults":{ "defType":"edismax", "wt":"json"},
"invariants":{"wt":"skg"}, "runtimeLib": true } }'
Per-field options:
{field_name}.fallback string
An alternative (usually free-text) field to query in case foreground returns less than min_popularity. Example: {field_name}.fallback=content
{field_name}.facet-field string
A suffix to find a copy-field version of the value in {field_name} to use for the node during traversal. Example: {field_name}.facet-field=id-name
{field_name}.key string
A parallel key / id field to use to represent a particular value. Typically used to disambiguate multiple entities (each with a unique id) sharing the same textual name
Example:{field.v1.key}=id
Demos
Knowledge
Graph
Knowledge
Graph
Who’s in Love with Jean Grey?
Build a Co-occurrence Matrix
http://localhost:8983/solr/job-postings/skg
Related term vector (for query concept expansion)
http://localhost:8983/solr/stack-exchange-health/skg
Score keywords from within a document
http://localhost:8983/solr/job-postings/skg
• Lucidworks is working on integrating the semantic knowledge
graph into Solr’s JSON faceting API (requires some new
capabilities and structural changes, but greatly enhances it)
• WIP, but this new approach looks promising. We’ll post a
patch soon. Hoping to get committed for Solr 7.4
• Being integrated into Lucidworks Fusion as one of several
key elements of our semantic search pipeline, content-based
recommendations, and to power discovery in our analytics
suite.
Next Steps
Contact Info
Trey Grainger
trey.grainger@lucidworks.com
@treygrainger
http://solrinaction.com
Other presentations:
http://www.treygrainger.com
Discount code: ctwhaystack18
Knowledge
Graph
Populating the Graph
Other Solr Graph Capabilities
https://lucidworks.com/webinar/solr-6-deep-dive-sql-graph/
top(n="5", sort="count(*) desc",
gatherNodes(movielens,
top(n="30", sort="count(*) desc",
gatherNodes(movielens,
search(movielens, q="user_id_i:305", fl="movie_id_i", sort="movie_id_i asc", qt=“/export"),
walk="movie_id_i->movie_id_i", gather="user_id_i", maxDocFreq="10000", count(*)
)
),
walk="node->user_id_i",
gather="movie_id_i", count(*)
)
)
Distributed Graph Traversal Streaming Expressions
Covered in more detail
in deep-dive SQL &
Graph webinar:
Graph Query Parser
• Query-time, cyclic aware graph traversal is able to rank documents based on relationships
• Provides controls for depth, filtering of results and inclusion
of root and/or leaves
• Limitations: single node/shard only
Examples:
• http://localhost:8983/solr/graph/query?fl=id,score&
q={!graph from=in_edge to=out_edge}id:A
• http://localhost:8983/solr/my_graph/query?fl=id&
q={!graph from=in_edge to=out_edge
traversalFilter='foo:[* TO 15]'}id:A
• http://localhost:8983/solr/my_graph/query?fl=id&
q={!graph from=in_edge to=out_edge maxDepth=1}foo:[* TO 10]
• Term Frequency: “How well a term describes a document?”
– Measure: how often a term occurs per document
• Inverse Document Frequency: “How important is a term overall?”
– Measure: how rare the term is across all documents
TF * IDF
*Source: Solr in Action, chapter 3
Lucidworks Fusion Architecture
Lucidworks Fusion
Knowledge Graph – Potential Use Cases
Cross-walk between Types
• Have an ID field, but want to enable free text search
on the most associated entity with that ID?
• Have a “state” (geo) search box, but want to accept
any free-text location and map it to the right state?
• Have an old classification taxonomy and want to
know how the values from the old system now map
into the new values?
Build User Profiles from Search Logs
• If someone searches for “Java”, and then “JQuery”,
and then “CSS”, and then “JSP”, what do those have
in common?
• What if they search for “Java”, and then “C++”, and
then “Assembly”?
Discover Relationships Between Anything
• If I want to become a data scientist and know
Python, what libraries should I learn?
• If my last job was mid-level software engineer and
my current job is Engineering Lead, what are my
most likely next roles?
Traverse arbitrarily deep, Sort on anything
• Build an instant co-occurrence matrix, sort the top
values by their relatedness, and then add in any
number of additional dimensions (RAM permitting).
Data Cleansing
• Have dirty taxonomies and need to figure out which
items don’t belong?
• Need to understand the conceptual cohesion of a
document (vs spammy or off-topic content)?
Knowledge
Graph
Knowledge
Graph
Future Work
• Semantic Search (more experiments)
• Search Engine Relevancy Algorithms
• Trending Topics
• Recommendation Systems
• Root Cause Analysis
• Abuse Detection
Knowledge
Graph
Conclusion
Applications:
The Semantic Knowledge Graph has numerous applications, including
automatically building ontologies, identification of trending topics over time,
predictive analytics on timeseries data, root-cause analysis surfacing concepts
related to failure scenarios from free text, data cleansing, document
summarization, semantic search interpretation and expansion of queries,
recommendation systems, and numerous other forms of anomaly detection.
Main contribution of this paper:
The introduction (and open sourcing) of the the Semantic Knowledge Graph, a
novel and compact new graph model
that can dynamically materialize and score the relationships between any arbitrary
combination of entities represented within a corpus of documents.
Knowledge
Graph
References
Serves as a “data science toolkit” API that allows dynamically navigating and pivoting through multiple
levels of relationships between items in a domain.
Knowledge Graph API
Core similarity engine, exposed via API
Any product can leverage the core relationship scoring
engine to score any list of entities against any other list
Full domain support
Keywords, categories, tags, based upon any field on your
documents. Graph is build automatically from the
content representing your domain.
Intersections, overlaps, & relationship
scoring, many levels deep
Users can either provide a list of items to score, or else
have the system dynamically discover the most related
items (or both).
Knowledge
Graph

More Related Content

What's hot

Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
 
Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelTrey Grainger
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphTrey Grainger
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User IntentTrey Grainger
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market Xun Wang
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Paris Sud University
 
Interleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsInterleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsJohn T. Kane
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Kai Li
 

What's hot (20)

Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital Transformation
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge Graph
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Semantic search
Semantic searchSemantic search
Semantic search
 
Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
 
Interleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsInterleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904Labs
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
 

Similar to The Relevance of the Apache Solr Semantic Knowledge Graph

TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
 
From keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic searchFrom keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic searchCareerBuilder.com
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Lucidworks
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudDatabricks
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
 
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Simplilearn
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
 
Test Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsTest Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsHugh McCamphill
 
Brochure data science learning path board-infinity (1)
Brochure   data science learning path board-infinity (1)Brochure   data science learning path board-infinity (1)
Brochure data science learning path board-infinity (1)NirupamNishant2
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Talent42
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Tejas bichave m tech python
Tejas bichave  m tech pythonTejas bichave  m tech python
Tejas bichave m tech pythontejas bichave
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingPaco Nathan
 

Similar to The Relevance of the Apache Solr Semantic Knowledge Graph (20)

TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
From keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic searchFrom keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic search
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
Test Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsTest Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely tests
 
Brochure data science learning path board-infinity (1)
Brochure   data science learning path board-infinity (1)Brochure   data science learning path board-infinity (1)
Brochure data science learning path board-infinity (1)
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Tejas bichave m tech python
Tejas bichave  m tech pythonTejas bichave  m tech python
Tejas bichave m tech python
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
 

More from Trey Grainger

Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceTrey Grainger
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrTrey Grainger
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchTrey Grainger
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 

More from Trey Grainger (8)

Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative Space
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 

Recently uploaded

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 

Recently uploaded (20)

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 

The Relevance of the Apache Solr Semantic Knowledge Graph

  • 1. The Relevance of the Apache Solr Semantic Knowledge Graph Trey Grainger SVP of Engineering, Lucidworks 2017.04.11
  • 2. Trey Grainger SVP of Engineering • Previously Director of Engineering @ CareerBuilder • MBA, Management of Technology – Georgia Tech • BA, Computer Science, Business, & Philosophy – Furman University • Information Retrieval & Web Search - Stanford University Other fun projects: • Co-author of Solr in Action, plus numerous research papers • Frequent conference speaker • Founder of Celiaccess.com, the gluten-free search engine • Lucene/Solr contributor • Startup Investor / Advisor About Me
  • 3. • “A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain” • A multi-dimensional term-to-term (vs. term-to-document) search engine • A tool which enables knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems • It’s kind of like Word2Vec, but better suited for interpreting the nuanced intent of typical search queries What is the Semantic Knowledge Graph?
  • 4. • Introduction • Overview of Semantic Knowledge Graph - Index Structure - Graph Traversal Mechanics - Edge Weights/Relevancy Scoring • Use Cases - Document Summarization / Content-based Recommendations - Predictive Analytics - Data Cleansing - Document Classification & Enrichment - Query Disambiguation - Semantic Search / Concept Expansion • Installation • Live Demos • Q&A Agenda
  • 5. Based in San Francisco, offices and employees worldwide Fusion, the platform for building data-driven, smart apps Consulting and support for organizations using Solr Produces the world’s largest open source user conference dedicated to Lucene/Solr Lucidworks is the primary commercial contributor to the Apache Solr project Employs over 40% of the active committers on the Solr project Contributes over 70% of Solr's open source codebase 40% 70%
  • 6. We are a global company Our 8 offices serve over 400 customers.
  • 7. We solve big problems for the world’s brightest businesses.
  • 8. Fusion powers search for the brightest companies in the world.
  • 11. Why do we care? User’s Query: machine learning research and development Portland, OR software engineer AND hadoop, java Traditional Query Parsing: (machine AND learning AND research AND development AND portland) OR (software AND engineer AND hadoop AND java) Semantic Query Parsing: "machine learning" AND "research and development" AND "Portland, OR" AND "software engineer" AND hadoop AND java Semantically Expanded Query: ("machine learning"^10 OR "data scientist" OR "data mining" OR "artificial intelligence") AND ("research and development"^10 OR "r&d") AND AND ("Portland, OR"^10 OR "Portland, Oregon" OR {!geofilt pt=45.512,-122.676 d=50 sfield=geo}) AND ("software engineer"^10 OR "software developer") AND (hadoop^10 OR "big data" OR hbase OR hive) AND (java^10 OR j2ee)
  • 13. A Graph DSAA 2016 Montreal Quebec Canada Semantic Knowledge Graph Paper Trey Grainger Mohammed Korayem Andries Smith Khalifeh AlJadda in_country Node / Vertex Edge
  • 14. Term Documents a doc1 [2x] brown doc3 [1x] , doc5 [1x] cat doc4 [1x] cow doc2 [1x] , doc5 [1x] … ... once doc1 [1x], doc5 [1x] over doc2 [1x], doc3 [1x] the doc2 [2x], doc3 [2x], doc4[2x], doc5 [1x] … … Document Content Field doc1 once upon a time, in a land far, far away doc2 the cow jumped over the moon. doc3 the quick brown fox jumped over the lazy dog. doc4 the cat in the hat doc5 The brown cow said “moo” once. … … What you SEND to Lucene/Solr: How the content is INDEXED into Lucene/Solr (conceptually): The inverted index
  • 15. /solr/collection/select/?q=apache solr Term Documents … … apache doc1, doc3, doc4, doc5 … hadoop doc2, doc4, doc6 … … solr doc1, doc3, doc4, doc7, doc8 … … doc5 doc7 doc8 doc1 doc3 doc4 solr apache apache solr Matching queries to documents
  • 16. BM25 (Okapi “Best Match” 25th Iteration) Score(q, d) = ∑ idf(t) · ( tf(t in d) · (k + 1) ) / ( tf(t in d) + k · (1 – b + b · |d| / avgdl ) t in q Where: t = term; d = document; q = query; i = index tf(t in d) = numTermOccurrencesInDocument ½ idf(t) = 1 + log (numDocs / (docFreq + 1)) |d| = ∑ 1 t in d avgdl = = ( ∑ |d| ) / ( ∑ 1 ) ) d in i d in i k = Free parameter. Usually ~1.2 to 2.0. Increases term frequency saturation point. b = Free parameter. Usually ~0.75. Increases impact of document normalization.
  • 18. Knowledge Graph Challenges of building a traditional knowledge graph Because current knowledge bases / ontology learning systems typically requires explicitly modeling nodes and edges into a graph ahead of time, this unfortunately presents several limitations to the use of such a knowledge graph: • Entities not modeled explicitly as nodes have no known relationships to any other entities. • Edges exist between nodes, but not between arbitrary combinations of nodes, and therefore such a graph is not ideal for representing nuanced meanings of an entity when appearing within different contexts, as is common within natural language. • Substantial meaning is encoded in the linguistic representation of the domain that is lost when the underlying textual representation is not preserved: phrases, interaction of concepts through actions (i.e. verbs), positional ordering of entities and the phrases containing those entities, variations in spelling and other representations of entities, the use of adjectives to modify entities to represent more complex concepts, and aggregate frequencies of occurrence for different representations of entities relative to other representations. • It can be an arduous process to create robust ontologies, map a domain into a graph representing those ontologies, and ensure the generated graph is compact, accurate, comprehensive, and kept up to date.
  • 19. 01 Knowledge Graph Semantic Data Encoded into Free Text Content e en eng engi engineer engineers engineer engineersNode Type: Term software engineer software engineers electrical engineering engineer engineering software … … … Node Type: Character Sequence Node Type: Term Sequence Node Type: Document id: 1 text: looking for a software engineerwith degree in computer science or electrical engineering id: 2 text: apply to be a software engineer and work with other great software engineers id: 3 text: start a great careerin electrical engineering … …
  • 20. Serves as a “data science toolkit” API that allows dynamically navigating and pivoting through multiple levels of relationships between items in a domain. Semantic Knowledge Graph API Core similarity engine, exposed via API Any product can leverage the core relationship scoring engine to score any list of entities against any other list Full domain support Keywords, categories, tags, based upon any field on your documents. Graph is build automatically from the content representing your domain. Intersections, overlaps, & relationship scoring, many levels deep Users can either provide a list of items to score, or else have the system dynamically discover the most related items (or both). Knowledge Graph
  • 22. id: 1 job_title: Software Engineer desc: software engineer at a great company skills: .Net, C#, java id: 2 job_title: Registered Nurse desc: a registered nurse at hospital doing hard work skills: oncology, phlebotemy id: 3 job_title: Java Developer desc: a software engineer or a java engineer doing work skills: java, scala, hibernate field doc term desc 1 a at company engineer great software 2 a at doing hard hospital nurse registered work 3 a doing engineer java or software work job_title 1 Software Engineer … … … Terms-Docs Inverted IndexDocs-Terms Forward IndexDocuments Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016. Knowledge Graph field term postings list doc pos desc a 1 4 2 1 3 1, 5 at 1 3 2 4 company 1 6 doing 2 6 3 8 engineer 1 2 3 3, 7 great 1 5 hard 2 7 hospital 2 5 java 3 6 nurse 2 3 or 3 4 registered 2 2 software 1 1 3 2 work 2 10 3 9 job_title java developer 3 1 … … … …
  • 23. Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016. Knowledge Graph Set-theory View Graph View How the Graph Traversal Works skill: Java skill: Scala skill: Hibernate skill: Oncology doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 skill: Java skill: Java skill: Scala skill: Hibernate skill: Oncology Data Structure View Java Scala Hibernate docs 1, 2, 6 docs 3, 4 Oncology doc 5
  • 24. Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016. Knowledge Graph Multi-level Traversal Data Structure View Graph View doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 skill: Java skill: Java skill: Scala skill: Hibernate skill: Oncology doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 job_title: Software Engineer job_title: Data Scientist job_title: Java Developer …… Inverted Index Lookup Forward Index Lookup Forward Index Lookup Inverted Index Lookup Java Java Developer Hibernate Scala Software Engineer Data Scientist has_related_job_title has_related_job_title
  • 25. Knowledge Graph Materialization of new nodes through shared documents engineer engineers software engineer* (materialized node) engineer* (materialized node) Software doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 links_to links_to
  • 27. Knowledge Graph From the academic paper: Structure: Single-level Traversal / Scoring: Multi-level Traversal / Scoring:
  • 28. Scoring of Node Relationships (Edge Weights) Foreground vs. Background Analysis Every term scored against it’s context. The more commonly the term appears within it’s foreground context versus its background context, the more relevant it is to the specified foreground context. countFG(x) - totalDocsFG * probBG(x) z = -------------------------------------------------------- sqrt(totalDocsFG * probBG(x) * (1 - probBG(x))) { "type":"keywords”, "values":[ { "value":"hive", "relatedness":0.9773, "popularity":369 }, { "value":"java", "relatedness":0.9236, "popularity":15653 }, { "value":".net", "relatedness":0.5294, "popularity":17683 }, { "value":"bee", "relatedness":0.0, "popularity":0 }, { "value":"teacher", "relatedness":-0.2380, "popularity":9923 }, { "value":"registered nurse", "relatedness": -0.3802 "popularity":27089 } ] } We are essentially boosting terms which are more related to some known feature (and ignoring terms which are equally likely to appear in the background corpus) + - Foreground Query: "Hadoop" Knowledge Graph
  • 29. Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016. Knowledge Graph Multi-level Graph Traversal with Scores software engineer* (materialized node) Java C# .NET .NET Developer Java Developer Hibernate ScalaVB.NET Software Engineer Data Scientist Skill Nodes has_related_skillStarting Node Skill Nodes has_related_skill Job Title Nodes has_related_job_title 0.90 0.88 0.93 0.93 0.34 0.74 0.91 0.89 0.74 0.89 0.780.72 0.48 0.93 0.76 0.83 0.80 0.64 0.61 0.780.55
  • 31. Knowledge Graph Use Case: Document Summarization / Content-based Recommendations Experiment: Pass in raw text (extracting phrases as needed), and rank their similarity to the documents using the SKG. Additionally, can traverse the graph to “related” entities/keyword phrases NOT found in the original document Applications: Content-based and multi-modal recommendations (no cold-start problem), data cleansing prior to clustering or other ML methods, semantic search / similarity scoring
  • 33. Knowledge Graph Use Case: Data Cleansing { "type":"keywords”, "values":[ { "value":"hive", "relatedness": 0.9765, "popularity":369 }, { "value":”spark", "relatedness": 0.9634, "popularity":15653 }, { "value":".net", "relatedness": 0.5417, "popularity":17683 }, { "value":"bogus_word", "relatedness": 0.0, "popularity":0 }, { "value":"teaching", "relatedness": -0.1510, "popularity":9923 }, { "value":"CPR", "relatedness": -0.4012, "popularity":27089 } ] } Foreground Query: "Hadoop" Experiment: Data analyst manually annotated 500 pairs of terms found together in real query logs as “relevant” or “not relevant” Results: SKG removed 78% of the terms while maintaining a 95% accuracy at removing the correct noisy pairs from the input data.
  • 34. Use Case: Document Classification & Enrichment Knowledge Graph
  • 35. Use Case: Query Disambiguation Example Related Keywords (representing multiple meanings) driver truck driver, linux, windows, courier, embedded, cdl, delivery architect autocad drafter, designer, enterprise architect, java architect, designer, architectural designer, data architect, oracle, java, architectural drafter, autocad, drafter, cad, engineer … … Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
  • 36. A few methodologies: 1) Query Log Mining 2) Semantic Knowledge Graph Knowledge Graph
  • 37. Query Log Mining: Discovering ambiguous phrases 1) Classify users who ran each search in the search logs (i.e. by the job title classifications of the jobs to which they applied) 3) Segment the search term => related search terms list by classification, to return a separate related terms list per classification 2) Create a probabilistic graphical model of those classifications mapped to each keyword phrase. Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
  • 38. Semantic Knowledge Graph: Discovering ambiguous phrases 1) Exact same concept, but use a document classification field (i.e. category) as the first level of your graph, and the related terms as the second level to which you traverse. 2) Has the benefit that you don’t need query logs to mine, but it will be representative of your data, as opposed to your user’s intent, so the quality depends on how clean and representative your documents are. Additional Benefit: Multi-dimensional disambiguation and dynamic materialization of categories. Effectively an dynamically-materialized probabilistic graphical model
  • 39. Disambiguated meanings (represented as term vectors) Example Related Keywords (Disambiguated Meanings) architect 1: enterprise architect, java architect, data architect, oracle, java, .net 2: architectural designer, architectural drafter, autocad, autocad drafter, designer, drafter, cad, engineer driver 1: linux, windows, embedded 2: truck driver, cdl driver, delivery driver, class b driver, cdl, courier designer 1: design, print, animation, artist, illustrator, creative, graphic artist, graphic, photoshop, video 2: graphic, web designer, design, web design, graphic design, graphic designer 3: design, drafter, cad designer, draftsman, autocad, mechanical designer, proe, structural designer, revit … … Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
  • 40. Using the disambiguated meanings In a situation where a user searches for an ambiguous phrase, what information can we use to pick the correct underlying meaning? 1. Any pre-existing knowledge about the user: • User is a software engineer • User has previously run searches for “c++” and “linux” 2. Context within the query: User searched for windows AND driver vs. courier OR driver 3. If all else fails (and there is no context), use the most commonly occurring meaning. driver 1: linux, windows, embedded 2: truck driver, cdl driver, delivery driver, class b driver, cdl, courier Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
  • 41. Knowledge Graph Use Case: Query Expansion Experiment: Take an initial query, and expand keyword phrases to include the most related entities to that query Example:
  • 42. The Semantic Search Problem User’s Query: machine learning research and development Portland, OR software engineer AND hadoop, java Traditional Query Parsing: (machine AND learning AND research AND development AND portland) OR (software AND engineer AND hadoop AND java) Semantic Query Parsing: "machine learning" AND "research and development" AND "Portland, OR" AND "software engineer" AND hadoop AND java Semantically Expanded Query: ("machine learning"^10 OR "data scientist" OR "data mining" OR "artificial intelligence") AND ("research and development"^10 OR "r&d") AND AND ("Portland, OR"^10 OR "Portland, Oregon" OR {!geofilt pt=45.512,-122.676 d=50 sfield=geo}) AND ("software engineer"^10 OR "software developer") AND (hadoop^10 OR "big data" OR hbase OR hive) AND (java^10 OR j2ee)
  • 45. #1: Pull, Build, Start Solr cd ~/demo git clone https://github.com/apache/lucene-solr.git git checkout releases/lucene-solr/7.2.1 cd lucene-solr/solr ant server bin/solr -c –Denable.runtime.lib=true #2: Pull and Build Semantic Knowledge Graph cd ~/demo git clone https://github.com/treygrainger/semantic-knowledge-graph.git git checkout solr_7.2.1 mvn package #3: Install Plugin curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @semantic- knowledge-graph-1.0-SNAPSHOT.jar http://localhost:8983/solr/.system/blob/skg.jar #4: Create Collection curl http://localhost:8983/solr/admin/collections?action=CREATE&name=demo-collection
  • 46. #5: Register Plugin with Collection curl http://localhost:8983/solr/demo-collection/config -H 'Content-type:application/json' -d '{ "add-runtimelib": { "name":"skg.jar", "version":1 } }' #6: Register Response Writer curl http://localhost:8983/solr/demo-collection/config -H 'Content-type:application/json' -d '{ "add-queryresponsewriter": { "name": "skg", "runtimeLib": true, "class": ”org.apache.solr.search.skg.responsewriter.KnowledgeGraphResponseWriter"} }’ #7: Register Request Handler curl http://localhost:8983/solr/demo-collection/config -H 'Content-type:application/json' -d '{ "add-requesthandler" : { "name": "/skg", "class":”org.apache.solr.search.skg.KnowledgeGraphHandler", "defaults":{ "defType":"edismax", "wt":"json"}, "invariants":{"wt":"skg"}, "runtimeLib": true } }'
  • 47. Per-field options: {field_name}.fallback string An alternative (usually free-text) field to query in case foreground returns less than min_popularity. Example: {field_name}.fallback=content {field_name}.facet-field string A suffix to find a copy-field version of the value in {field_name} to use for the node during traversal. Example: {field_name}.facet-field=id-name {field_name}.key string A parallel key / id field to use to represent a particular value. Typically used to disambiguate multiple entities (each with a unique id) sharing the same textual name Example:{field.v1.key}=id
  • 48. Demos
  • 51. Who’s in Love with Jean Grey?
  • 52. Build a Co-occurrence Matrix http://localhost:8983/solr/job-postings/skg
  • 53. Related term vector (for query concept expansion) http://localhost:8983/solr/stack-exchange-health/skg
  • 54. Score keywords from within a document http://localhost:8983/solr/job-postings/skg
  • 55. • Lucidworks is working on integrating the semantic knowledge graph into Solr’s JSON faceting API (requires some new capabilities and structural changes, but greatly enhances it) • WIP, but this new approach looks promising. We’ll post a patch soon. Hoping to get committed for Solr 7.4 • Being integrated into Lucidworks Fusion as one of several key elements of our semantic search pipeline, content-based recommendations, and to power discovery in our analytics suite. Next Steps
  • 56. Contact Info Trey Grainger trey.grainger@lucidworks.com @treygrainger http://solrinaction.com Other presentations: http://www.treygrainger.com Discount code: ctwhaystack18
  • 58. Other Solr Graph Capabilities
  • 59. https://lucidworks.com/webinar/solr-6-deep-dive-sql-graph/ top(n="5", sort="count(*) desc", gatherNodes(movielens, top(n="30", sort="count(*) desc", gatherNodes(movielens, search(movielens, q="user_id_i:305", fl="movie_id_i", sort="movie_id_i asc", qt=“/export"), walk="movie_id_i->movie_id_i", gather="user_id_i", maxDocFreq="10000", count(*) ) ), walk="node->user_id_i", gather="movie_id_i", count(*) ) ) Distributed Graph Traversal Streaming Expressions Covered in more detail in deep-dive SQL & Graph webinar:
  • 60. Graph Query Parser • Query-time, cyclic aware graph traversal is able to rank documents based on relationships • Provides controls for depth, filtering of results and inclusion of root and/or leaves • Limitations: single node/shard only Examples: • http://localhost:8983/solr/graph/query?fl=id,score& q={!graph from=in_edge to=out_edge}id:A • http://localhost:8983/solr/my_graph/query?fl=id& q={!graph from=in_edge to=out_edge traversalFilter='foo:[* TO 15]'}id:A • http://localhost:8983/solr/my_graph/query?fl=id& q={!graph from=in_edge to=out_edge maxDepth=1}foo:[* TO 10]
  • 61. • Term Frequency: “How well a term describes a document?” – Measure: how often a term occurs per document • Inverse Document Frequency: “How important is a term overall?” – Measure: how rare the term is across all documents TF * IDF *Source: Solr in Action, chapter 3
  • 64. Knowledge Graph – Potential Use Cases Cross-walk between Types • Have an ID field, but want to enable free text search on the most associated entity with that ID? • Have a “state” (geo) search box, but want to accept any free-text location and map it to the right state? • Have an old classification taxonomy and want to know how the values from the old system now map into the new values? Build User Profiles from Search Logs • If someone searches for “Java”, and then “JQuery”, and then “CSS”, and then “JSP”, what do those have in common? • What if they search for “Java”, and then “C++”, and then “Assembly”? Discover Relationships Between Anything • If I want to become a data scientist and know Python, what libraries should I learn? • If my last job was mid-level software engineer and my current job is Engineering Lead, what are my most likely next roles? Traverse arbitrarily deep, Sort on anything • Build an instant co-occurrence matrix, sort the top values by their relatedness, and then add in any number of additional dimensions (RAM permitting). Data Cleansing • Have dirty taxonomies and need to figure out which items don’t belong? • Need to understand the conceptual cohesion of a document (vs spammy or off-topic content)? Knowledge Graph
  • 65. Knowledge Graph Future Work • Semantic Search (more experiments) • Search Engine Relevancy Algorithms • Trending Topics • Recommendation Systems • Root Cause Analysis • Abuse Detection
  • 66. Knowledge Graph Conclusion Applications: The Semantic Knowledge Graph has numerous applications, including automatically building ontologies, identification of trending topics over time, predictive analytics on timeseries data, root-cause analysis surfacing concepts related to failure scenarios from free text, data cleansing, document summarization, semantic search interpretation and expansion of queries, recommendation systems, and numerous other forms of anomaly detection. Main contribution of this paper: The introduction (and open sourcing) of the the Semantic Knowledge Graph, a novel and compact new graph model that can dynamically materialize and score the relationships between any arbitrary combination of entities represented within a corpus of documents.
  • 68. Serves as a “data science toolkit” API that allows dynamically navigating and pivoting through multiple levels of relationships between items in a domain. Knowledge Graph API Core similarity engine, exposed via API Any product can leverage the core relationship scoring engine to score any list of entities against any other list Full domain support Keywords, categories, tags, based upon any field on your documents. Graph is build automatically from the content representing your domain. Intersections, overlaps, & relationship scoring, many levels deep Users can either provide a list of items to score, or else have the system dynamically discover the most related items (or both). Knowledge Graph