SlideShare a Scribd company logo
1 of 68
Download to read offline
Scaling Recommendations, 
Semantic Search, & Data Analytics with Solr 
Trey Grainger 
Director of Engineering, Search & Analytics 
@ 
Atla 
Atlanta Solr Meetup 
2014.10.21, Atlanta Tech Village 
Sponsored by:
About Me 
Trey Grainger 
Director of Engineering, Search & Analytics 
• Joined CareerBuilder in 2007 as Software Engineer 
• MBA, Management of Technology – GA Tech 
• BA, Computer Science, Business, & Philosophy – Furman University 
• Mining Massive Datasets (in progress) - Stanford University 
• Fun outside of CB: 
• Author (Solr in Action), plus several research papers 
• Frequent conference speaker 
• Founder of Celiaccess.com, the gluten-free search engine 
• Lucene/Solr contributor
Overview 
• Intro 
• CareerBuilder’s Search Infrastructure 
• Solr as a Recommendation Engine 
• Semantic Search with Solr 
• Solr-powered Data Analytics 
• Q & A
Search Powers…
My Search Team 
Joe Streeky 
Search Framework Development Manager 
Search Infrastructure Team Core Search Team 
Job Search Team Candidate Search Team Relevancy & 
Recommendations Team 
Applied Search Teams:
Scaling Recommendations, 
Semantic Search, & Data Analytics with Solr
About Me 
Joseph Streeky 
Manager, Search Framework Development 
• Joined CareerBuilder in 2005 as Software Engineer 
• BS, Computer Science – GA Tech 
• Natural Language Processing – Columbia University 
• Software Engineering for SaaS – University of California, Berkeley
About Search @CareerBuilder 
• 2 million active jobs each month 
• 60 million actively searchable resumes 
• 450 globally distributed search servers (in the 
U.S., Europe, & the cloud) 
• Thousands of unique, dynamically generated 
search indexes 
• 1.5 billion search documents 
• 2-3 million searches an hour
Our Search Infrastructure 
Feeding 
Stack 
Hadoop 
SQL 
Cassandra 
RabbitMQ 
Solr 
Processing 
Tier
Our Search Infrastructure 
Query Load Balancer 
Solr Solr 
Solr 
Feeding Platform
Our Search Platform 
• Generic Search API wrapping Solr + our domain stack 
• Goal: Abstract away search into a simple API so that 
any engineer can build search-based products with 
no prior search background 
• 3 Supported Methods (with rich syntax): 
– AddDocument 
– DeleteDocument 
– Search 
*users pass along their own dynamically-defined schemas on each call
Scaling Recommendations, 
Semantic Search, & Data Analytics with Solr
Business Case for Recommendations 
• For companies like CareerBuilder, recommendations 
can provide as much or even greater business value 
(i.e. views, sales, job applications) than user-driven 
search capabilities. 
• Recommendations create stickiness to pull users 
back to your company’s website, app, etc.
Consider the information you know about your users 
• John lives in Boston but wants to move to New York or possibly 
another big city. He is currently a sales manager but wants to move 
towards business development. 
• Irene is a bartender in Dublin and is only interested in jobs within 
10KM of her location in the food service industry. 
• Irfan is a software engineer in Atlanta and is interested in software 
engineering jobs at a Big Data company. He is happy to move across 
the U.S. for the right job. 
• Jane is a nurse educator in Boston seeking between $40K and $60K 
working in the state of Massachusetts
Query for Jane 
Jane is a nurse educator in Boston seeking between $40K and $60K 
working in the state of Massachusetts 
http://localhost:8983/solr/jobs/select/? 
fl=jobtitle,city,state,salary& 
q=( 
jobtitle:"nurse educator"^25 OR jobtitle:(nurse educator)^10 
) 
AND ( 
(city:"Boston" AND state:"MA")^15 
OR state:"MA”) 
AND _val_:"map(salary, 40000, 60000,10, 0)” 
*Example from chapter 16 of Solr in Action
Search Results for Jane 
{ ... 
"response":{"numFound":22,"start":0,"docs":[ 
{"jobtitle":"Clinical Educator 
(New England/ Boston)", 
"city":"Boston", 
"state":"MA", 
"salary":41503}, 
…]}} 
{"jobtitle":"Nurse Educator", 
"city":"Braintree", 
"state":"MA", 
"salary":56183}, 
{"jobtitle":"Nurse Educator", 
"city":"Brighton", 
"state":"MA", 
"salary":71359} 
*Example documents available @ https://github.com/treygrainger/solr-in-action/blob/first-edition/example-docs/ch16/
What did we just do? 
• We built a recommendation engine! 
• What is a recommendation engine? 
– A system that uses known information (or derived 
information from that known information) to 
automatically suggest relevant content 
• Our example was just an attribute based 
recommendation… we’ll see that behavioral-based 
(i.e. collaborative filtering) is also possible.
Redefining “Search Engine” 
• “Lucene is a high-performance, full-featured 
text search engine library…” 
Yes, but really… 
• Lucene is a high-performance, fully-featured 
token matching and scoring library… which 
can perform full-text searching.
Redefining “Search Engine” 
or, in machine learning speak: 
• A Lucene index is multi-dimensional 
sparse matrix… with very fast and powerful 
lookup and vector multiplication capabilities. 
• Think of each field as a matrix containing each 
term mapped to each document
The Lucene Inverted Index 
(traditional text example) 
Term Documents 
a doc1 [2x] 
brown doc3 [1x] , doc5 [1x] 
cat doc4 [1x] 
cow doc2 [1x] , doc5 [1x] 
… ... 
once doc1 [1x], doc5 [1x] 
over doc2 [1x], doc3 [1x] 
the doc2 [2x], doc3 [2x], 
doc4[2x], doc5 [1x] 
… … 
What you SEND to Lucene/Solr: 
Document Content Field 
doc1 once upon a time, in a land 
far, far away 
doc2 the cow jumped over the 
moon. 
doc3 the quick brown fox 
jumped over the lazy dog. 
doc4 the cat in the hat 
doc5 The brown cow said “moo” 
once. 
… … 
How the content is INDEXED 
into Lucene/Solr (conceptually):
Match Text Queries to Text Fields 
/solr/select/?q=jobcontent:(software engineer) 
Job Content Field Documents 
… … 
engineer doc1, doc3, doc4, 
doc5 
… 
mechanical doc2, doc4, doc6 
… … 
software doc1, doc3, doc4, 
doc7, doc8 
… … 
engineer 
doc5 
software engineer 
doc1 doc3 
doc4 
software 
doc7 doc8
Beyond Text Searching 
• Lucene/Solr is a search matching engine 
• When Lucene/Solr search text, they are 
matching tokens in the query with tokens in the 
index 
• Anything that can be searched upon can form 
the basis of matching and scoring: 
– text, attributes, locations, results of functions, user 
behavior, classifications, etc.
Approaches to Recommendations 
• Content-based 
– Attribute-based 
• i.e. income level, hobbies, location, experience 
– Classification-based 
• i.e. “medical//nursing//oncology”, “animal//dog//terrier” 
– Textual Similarity-based 
• i.e. Solr’s MoreLikeThis Request Handler & Search Handler 
– Concept-based 
• i.e. Solr => “software engineer”, “java”, “search”, “open source” 
• Collaborative Filtering 
• “Users who liked that also liked this…” 
• Hybrid Approaches
Collaborative Filtering 
What you SEND to Lucene/Solr: How the content is INDEXED into 
Term Documents 
user1 doc1, doc5 
user2 doc2 
user3 doc2 
user4 doc1, doc3, 
doc4, doc5 
user5 doc1, doc4 
… … 
Document “Users who bought this 
product” field 
doc1 user1, user4, user5 
doc2 user2, user3 
doc3 user4 
doc4 user4, user5 
doc5 user4, user1 
… … 
Lucene/Solr (conceptually):
Step 1: Find similar users who like the same documents 
q=documentid: ("doc1" OR "doc4") 
Document “Users who bought this 
product” field 
doc1 user1, user4, user5 
doc2 user2, user3 
doc3 user4 
doc4 user4, user5 
doc5 user4, user1 
… … 
doc1 
user1 user4 
user5 
doc4 
user4 user5 
Top-scoring results (most similar users): 
1) user4 (2 shared likes) 
2) user5 (2 shared likes) 
3) user 1 (1 shared like) 
*Source: Solr in Action, chapter 16
Step 2: Search for docs “liked” by those similar users 
Term Documents 
user1 doc1, doc5 
user2 doc2 
user3 doc2 
user4 doc1, doc3, 
doc4, doc5 
user5 doc1, doc4 
… … 
Top recommended documents: 
1) doc1 (matches user4, user5, user1) 
2) doc4 (matches user4, user5) 
3) doc5 (matches user4, user1) 
4) doc3 (matches user4) 
// doc2 does not match 
Most similar users: 
1) user4 (2 shared likes) 
2) user5 (2 shared likes) 
3) user 1 (1 shared like) 
/solr/select/?q=userlikes:("user4"^2 
OR "user5"^2 OR "user1"^1) 
*Source: Solr in Action, chapter 16
Content-based Recommendations: 
More Like This (Query) 
solrconfig.xml: 
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler" /> 
Query: 
/solr/jobs/mlt/?df=jobdescription& 
fl=id,jobtitle& 
rows=3& 
q=J2EE& // recommendations based on top scoring doc 
mlt.fl=jobtitle,jobdescription& // inspect these fields for interesting terms 
mlt.interestingTerms=details& // return the interesting terms 
mlt.boost=true 
*Example from chapter 16 of Solr in Action
More Like This (Results) 
{"match":{"numFound":122,"start":0,"docs":[ 
{"id":"fc57931d42a7ccce3552c04f3db40af8dabc99dc", 
"jobtitle":"Senior Java / J2EE Developer"}] 
}, 
"response":{"numFound":2225,"start":0,"docs":[ 
{"id":"0e953179408d710679e5ddbd15ab0dfae52ffa6c", 
"jobtitle":"Sr Core Java Developer"}, 
{"id":"5ce796c758ee30ed1b3da1fc52b0595c023de2db", 
"jobtitle":"Applications Developer"}, 
{"id":"1e46dd6be1750fc50c18578b7791ad2378b90bdd", 
"jobtitle":"Java Architect/ Lead Java Developer - 
WJAV Java - Java in Pittsburgh PA"},]}, 
"interestingTerms":[ 
"jobdescription:j2ee",1.0, 
"jobdescription:java",0.68131137, 
"jobdescription:senior",0.52161527, 
"jobtitle:developer",0.44706684, 
"jobdescription:source",0.2417754, 
"jobdescription:code",0.17976432, 
"jobdescription:is",0.17765637, 
"jobdescription:client",0.17331646, 
"jobdescription:our",0.11985878, 
"jobdescription:for",0.07928475, 
"jobdescription:a",0.07875194, 
"jobdescription:to",0.07741922, 
"jobdescription:and",0.07479082]}} 
*Example from chapter 16 of Solr in Action
More Like This (passing in external document) 
/solr/jobs/mlt/?df=jobdescription& 
fl=id,jobtitle& 
mlt.fl=jobtitle,jobdescription& 
mlt.interestingTerms=details& 
mlt.boost=true 
stream.body=Solr is an open source enterprise search 
platform from the Apache Lucene project. Its major features 
include full-text search, hit highlighting, faceted search, dynamic 
clustering, database integration, and rich document (e.g., Word, 
PDF) handling. Providing distributed search and index 
replication, Solr is highly scalable. Solr is the most popular 
enterprise search engine. Solr 4 adds NoSQL features. 
*Example from chapter 16 of Solr in Action
More Like This (Results) 
{"response":{"numFound":2221,"start":0,"docs":[ 
{"id":"eff5ac098d056a7ea6b1306986c3ae511f2d0d89 ", 
"jobtitle":"Enterprise Search Architect…"}, 
{"id":"37abb52b6fe63d601e5457641d2cf5ae83fdc799 ", 
"jobtitle":"Sr. Java Developer"}, 
{"id":"349091293478dfd3319472e920cf65657276bda4 ", 
"jobtitle":"Java Lucene Software Engineer"},]}, 
"interestingTerms":[ 
"jobdescription:search",1.0, 
"jobdescription:solr",0.9155779, 
"jobdescription:features",0.36472517, 
"jobdescription:enterprise",0.30173126, 
"jobdescription:is",0.17626463, 
"jobdescription:the",0.102924034, 
"jobdescription:and",0.098939896]} } 
*Example from chapter 16 of Solr in Action
Understanding Our Users 
• Machine learning algorithms can help us understand what 
matters most to different groups of users. 
Example: Willingness to relocate for a job (miles per percentile) 
Software Engineers 
Restaurant Workers
Search & Recommendations are on a continuum... 
• Why limit yourself to JUST explicit search or JUST automated 
recommendations? 
• By augmenting your user’s explicit queries with information you know about 
them, you can personalize their search results. 
• Examples: 
– A known software engineer runs a blank keyword search in New York… 
• Why not show software engineering higher in the results? 
– A new user runs a keyword-only search for nurse 
• Why not use the user’s IP address to boost documents geographically 
closer?
Scaling Recommendations, 
Semantic Search, & Data Analytics with Solr
Semantic Search Architecture
Using Clustering to find semantic links
Setting up Clustering in solrconfig.xml
Clustering Query 
/solr/clustering/?q=(solr or lucene) 
&rows=100 
&carrot.title=titlefield 
&carrot.snippet=titlefield 
&LingoClusteringAlgorithm.desiredClusterCountBase=25 
//clustering & grouping don’t currently play nicely 
Allows you to dynamically identify “concepts” and their 
prevalence within a user’s top search results
Clustering Results 
Original Query: q=(solr or lucene) 
// can be a user’s search, their job title, a list of skills, 
// or any other keyword rich data source 
Clusters Identified: 
Developer (22) 
Java Developer (13) 
Software (10) 
Senior Java Developer (9) 
Architect (6) 
Software Engineer (6) 
Web Developer (5) 
Search (3) 
Software Developer (3) 
Systems (3) 
Administrator (2) 
Hadoop Engineer (2) 
Java J2EE (2) 
Search Development (2) 
Software Architect (2) 
Solutions Architect (2) 
Stage 1: Identify Concepts
Stage 2: Use Semantic Links in your relevancy calculation 
content:(“Developer”^22 or “Java Developer”^13 or “Software ” 
^10 or “Senior Java Developer”^9 or “Architect ”^6 or “Software 
Engineer”^6 or “Web Developer ”^5 or “Search”^3 or “Software 
Developer”^3 or “Systems”^3 or “Administrator”^2 or “Hadoop 
Engineer”^2 or “Java J2EE”^2 or “Search Development”^2 or 
“Software Architect”^2 or “Solutions Architect”^2) 
// Your can also add the user’s location or the original keywords to the 
// recommendations search if it helps results quality for your use-case.
Synonym Discovery Techniques 
• Our primary approach: 
Search Co-occurrences[1] + Point-wise Mutual Information[1] + PGMHD[2] 
• Strategy: Map/Reduce job which computes similar searches run for the same 
users 
John searched for “java developer” and “j2ee” 
Jane searched for “registered nurse” and “r.n.” and “nurse”. 
Zeke searched for “java developer” and “scala” and “jvm” 
• By mining the searches of tens millions of search terms per day, we get a list of top 
related searches, using multiple statistical measures. 
• We also tie each search term to the top category of jobs (i.e java developer, truck 
driver, etc.), so that we know in what context people search for each term. 
[1] K. Aljadda, M. Korayem, T. Grainger, C. Russell. "Crowdsourced Query Augmentation through Semantic Discovery of Domain-specific 
Jargon," in IEEE Big Data 2014. 
[2] K. Aljadda, M.Korayem, C. Ortiz, T. Grainger, J. Miller, W. York. "PGMHD: A Scalable Probabilistic Graphical Model for Massive 
Hierarchical Data Problems," in IEEE Big Data 2014
Examples of “related search terms” 
Example: “accounting” 
accountant 8880, 
accounts payable 5235, 
finance 3675, 
accounting clerk 3651, 
bookkeeper 3225, 
controller 2898, 
staff accountant 2866, 
accounts receivable 2842 
Example: “RN”: 
registered nurse 6588, 
rn registered nurse 4300, 
nurse 2492, 
nursing 912, 
lpn 707, 
healthcare 453, 
rn case manager 446, 
registered nurse rn 404, 
director of nursing 321, 
case manager 292
Related Keywords / 
Automatic Boolean Query Expansion
Categories of related terms... 
Synonyms: cpa => Certified Public Accountant 
rn => Registered Nurse 
r.n. => Registered Nurse 
Ambiguous Terms*: driver => driver (trucking) ~80% 
driver => driver (software) ~20% 
Related Terms: r.n. => nursing, bsn 
hadoop => mapreduce, hive, pig 
*disambiguation occurs based upon context and popularity
Semantic Search “under the hood”
Scaling Recommendations, 
Semantic Search, & Data Analytics with Solr
Workforce Supply & Demand
Why Solr for Analytics? 
• Allows “ad-hoc” querying of data by keywords 
• Is good at on-the-fly aggregate calculations 
(facets + stats + functions + grouping) 
• Solr is horizontally scalable, and thus able to handle 
billions of documents 
• Insanely Fast queries, encouraging user exploration
Faceting Overview 
/solr/select/?q=…&facet=true 
//Field Faceting 
&facet.field=city 
//Range Faceting 
&facet.range=years_experience 
&facet.range.start=0 
&facet.range.end=10 
&facet.range.gap=1 
&facet.range.other=after 
"facet_fields":{ 
"city":[ 
"new york, ny",2337, 
"los angeles, ca",1693, 
"chicago, il",1535, 
… ]} 
"facet_ranges":{ 
"years_experience":{ 
"counts":[ 
"0",1010035, 
"1",343831, 
… 
"9",121090 
], … 
"after":59462}} 
"facet_queries":{ 
"0 to 10 km":1187, 
"10 to 25 km":462, 
"25 to 50 km":794, 
"50+":105296 
}, 
//Query Faceting: 
&facet.query={!frange key="0 to 10 km" l=0 u=10 incll=false}geodist() 
&facet.query={!frange key="10 to 25 km" l=10 u=25 incll=false}geodist() 
&facet.query={!frange key="25 to 50 km" l=25 u=50 incll=false}geodist() 
&facet.query={!frange key="50+" l=50 incll=false}geodist() 
&sfield=location 
&pt=37.7770,-122.4200
Supply of Candidates
Supply of Candidates
Demand for Jobs
Supply over Demand (Labor Pressure)
Wait, how’d you do that?
/solr/select/?q=…&facet=true&facet.field=month* 
/solr/select/q=...&facet=true&facet.field=state 
/solr/select/?q=…&facet=true& 
facet.field=military_experience 
Building Blocks… 
*string field in format 201305
Building Blocks… 
/solr/select/? 
q="construction worker"& 
fq=city:"las vegas, nv"& 
facet=true& 
facet.field=company 
/solr/select/? 
q="construction worker"& 
fq=city:"las vegas, nv"& 
facet=true& 
facet.field=lastjobtitle
Building Blocks… 
/solr/select/? q=...& 
facet=true&facet.field=experience_ranges 
/solr/select/?q=...&facet=true& 
facet.field=management_experience
Radius Faceting
Hiring Comparison per Market
Geo-spatial Analytics 
Query 1: 
/solr/select/?... 
fq={!geofilt sfield=latlong pt=37.777,-122.420 d=80} 
&facet=true&facet.field=city& 
"facet_fields":{ 
"city":[ 
"san francisco, ca",11713, 
"san jose, ca",3071, 
"oakland, ca",1482, 
"palo alto, ca",1318, 
"santa clara, ca",1212, 
"mountain view, ca",1045, 
"sunnyvale, ca",1004, 
"fremont, ca",726, 
"redwood city, ca",633, 
Query 2: "berkeley, ca",599]} 
/solr/select/?... 
&facet=true&facet.field=city& 
fq=( _query_:"{!geofilt sfield=latlong pt=37.7770,-122.4200 d=20} " //san francisco 
OR _query_:"{!geofilt sfield=latlong pt=37.338,-121.886 d=20} " //san jose 
… 
OR _query_:"{!geofilt sfield=latlong pt=37.870,-122.271 d=20} " //berkeley 
)
SOLR-2894: “Distributed Pivot Faceting” 
#1 Most requested Solr feature 
56 
Status: This feature was developed primarily by 
the CareerBuilder search team and committed by 
Chris Hostetter to the latest released version of 
Solr (4.10).
SOLR-3583: “Stats within (pivot) facets” 
Status: We have submitted a patch (built on top of 
distributed pivot facets), but this will likely be replaced with 
SOLR-6350 + SOLR 6351 in the future.
SOLR-3583: “Stats within (pivot) facets” 
/solr/select?q=...& 
facet=true& 
facet.pivot=state,city& 
facet.stats.percentiles=true& 
facet.stats.percentiles.averages=true& 
facet.stats.percentiles.field=compensation& 
f.compensation.stats.percentiles.requested=10,25,50,75,90& 
f.compensation.stats.percentiles.lower.fence=1000& 
f.compensation.stats.percentiles.upper.fence=200000& 
f.compensation.stats.percentiles.gap=1000 
"facet_pivot":{ 
"state,city":[{ 
"field":"state", 
"value":"california", 
"count":1872280, 
"statistics":[ 
"compensation",[ 
"percentiles",[ 
"10.0","26000.0", 
"25.0","31000.0", 
"50.0","43000.0", 
"75.0","66000.0", 
"90.0","94000.0"], 
"percentiles_average",52613.72, 
"percentiles_count",1514592]], 
"pivot":[{ 
"field":"city", 
"value":"los angeles, ca", 
"count":134851, 
"statistics":{ 
"compensation":[ 
"percentiles",[ 
"10.0","26000.0", 
"25.0","31000.0", 
"50.0","45000.0", 
"75.0","70000.0", 
"90.0","95000.0"], 
"percentiles_average",54122.45, 
"percentiles_count",213481]}} 
… 
]}]}
Real-world Use Case 
Stats Pivot Stats Pivot Faceting (Percentiles) 
Faceting (Average) 
Another 
Pivot… Field 
Facet
Key Takeaways 
• Traditional search & recommendations are at two ends of a 
continuum between user-driven and automatic matching, and 
Solr is really good at giving you access to that full continuum. 
• Searching on text is one of many forms of matching. If you 
can migrate to searching on behaviors, entities, and concepts, 
you will see much better, more personalized results. 
Solr is a highly-scalable platform for rapid matching across 
large amounts of unstructured and structured data. 
Performing real-time analytics at scale is not only possible, 
but incredibly fast and flexible.
2014 Publications & Presentations 
Books: 
Solr in Action - A comprehensive guide to implementing scalable 
search using Apache Solr 
Research papers: 
● Towards a Job title Classification System 
● Augmenting Recommendation Systems Using a Model of Semantically-related Terms Extracted from 
User Behavior 
● sCooL: A system for academic institution name normalization 
● Crowdsourced Query Augmentation through Semantic Discovery of Domain-specific jargon 
● PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems 
● SKILL: A System for Skill Identification and Normalization (pending publication) 
Speaking Engagements: 
● WSDM 2014 Workshop: “Web-Scale Classification: Classifying Big Data from the Web” 
● Atlanta Solr Meetup 
● Atlanta Big Data Meetup 
● The Second International Symposium on Big Data and Data Analytics 
● Lucene/Solr Revolution 2014 
● RecSys 2014 
● IEEE Big Data Conference 2014
Contact Info 
▪ Trey Grainger 
trey.grainger@careerbuilder.com 
@treygrainger 
Other presentations: 
http://www.treygrainger.com http://solrinaction.com 
Meetup discount (42% off): solrmuau 
Yes, WE ARE HIRING @CareerBuilder. Come talk with me if you are interested…
Other Presentations:

More Related Content

What's hot

Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Enginelucenerevolution
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksLucidworks
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...lucenerevolution
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionLucidworks
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewKevin Watters
 
Solr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW TechnologySolr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW TechnologyLucidworks
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Search is the UI
Search is the UI Search is the UI
Search is the UI danielbeach
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engineth0masr
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrAndy Jackson
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, StubhubDeduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, StubhubLucidworks
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologyLucidworks
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearchsirensolutions
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Lucidworks
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and RecommendersLucidworks
 

What's hot (20)

Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with Fusion
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query Overview
 
Solr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW TechnologySolr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW Technology
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Search is the UI
Search is the UI Search is the UI
Search is the UI
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, StubhubDeduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and Recommenders
 

Viewers also liked

Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrTrey Grainger
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformTrey Grainger
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchTrey Grainger
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...BigData_Europe
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Using solr to find the right person for the right job - By Kang Laura
Using solr to find the right person for the right job - By Kang Laura   Using solr to find the right person for the right job - By Kang Laura
Using solr to find the right person for the right job - By Kang Laura lucenerevolution
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelTrey Grainger
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
 
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Lucidworks
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Semantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and RecruitingSemantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and RecruitingGlen Cathey
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Google algorithms
Google algorithmsGoogle algorithms
Google algorithmsstudent
 

Viewers also liked (20)

Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Using solr to find the right person for the right job - By Kang Laura
Using solr to find the right person for the right job - By Kang Laura   Using solr to find the right person for the right job - By Kang Laura
Using solr to find the right person for the right job - By Kang Laura
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Semantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and RecruitingSemantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and Recruiting
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Google algorithms
Google algorithmsGoogle algorithms
Google algorithms
 

Similar to Scaling Recommendations, Semantic Search, & Data Analytics with solr

Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User IntentTrey Grainger
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Petter Skodvin-Hvammen
 
Search Intelligence @elo7.com
Search Intelligence @elo7.comSearch Intelligence @elo7.com
Search Intelligence @elo7.comFernando Meyer
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialBarbara Starr
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologiesenterprisesearchmeetup
 

Similar to Scaling Recommendations, Semantic Search, & Data Analytics with solr (20)

Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
Google Dorks
Google DorksGoogle Dorks
Google Dorks
 
Everything You Wish You Knew About Search
Everything You Wish You Knew About SearchEverything You Wish You Knew About Search
Everything You Wish You Knew About Search
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
 
Search Intelligence @elo7.com
Search Intelligence @elo7.comSearch Intelligence @elo7.com
Search Intelligence @elo7.com
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
Beyond User Research
Beyond User ResearchBeyond User Research
Beyond User Research
 
Solr for Data Science
Solr for Data ScienceSolr for Data Science
Solr for Data Science
 

More from Trey Grainger

Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Trey Grainger
 
The Next Generation of AI-powered Search
The Next Generation of AI-powered SearchThe Next Generation of AI-powered Search
The Next Generation of AI-powered SearchTrey Grainger
 
Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)Trey Grainger
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceTrey Grainger
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
 
The Future of Search and AI
The Future of Search and AIThe Future of Search and AI
The Future of Search and AITrey Grainger
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search SystemTrey Grainger
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphTrey Grainger
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for MeaningTrey Grainger
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphTrey Grainger
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
 

More from Trey Grainger (13)

Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital Transformation
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)
 
The Next Generation of AI-powered Search
The Next Generation of AI-powered SearchThe Next Generation of AI-powered Search
The Next Generation of AI-powered Search
 
Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative Space
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)
 
The Future of Search and AI
The Future of Search and AIThe Future of Search and AI
The Future of Search and AI
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for Meaning
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge Graph
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Scaling Recommendations, Semantic Search, & Data Analytics with solr

  • 1. Scaling Recommendations, Semantic Search, & Data Analytics with Solr Trey Grainger Director of Engineering, Search & Analytics @ Atla Atlanta Solr Meetup 2014.10.21, Atlanta Tech Village Sponsored by:
  • 2. About Me Trey Grainger Director of Engineering, Search & Analytics • Joined CareerBuilder in 2007 as Software Engineer • MBA, Management of Technology – GA Tech • BA, Computer Science, Business, & Philosophy – Furman University • Mining Massive Datasets (in progress) - Stanford University • Fun outside of CB: • Author (Solr in Action), plus several research papers • Frequent conference speaker • Founder of Celiaccess.com, the gluten-free search engine • Lucene/Solr contributor
  • 3. Overview • Intro • CareerBuilder’s Search Infrastructure • Solr as a Recommendation Engine • Semantic Search with Solr • Solr-powered Data Analytics • Q & A
  • 5. My Search Team Joe Streeky Search Framework Development Manager Search Infrastructure Team Core Search Team Job Search Team Candidate Search Team Relevancy & Recommendations Team Applied Search Teams:
  • 6. Scaling Recommendations, Semantic Search, & Data Analytics with Solr
  • 7. About Me Joseph Streeky Manager, Search Framework Development • Joined CareerBuilder in 2005 as Software Engineer • BS, Computer Science – GA Tech • Natural Language Processing – Columbia University • Software Engineering for SaaS – University of California, Berkeley
  • 8. About Search @CareerBuilder • 2 million active jobs each month • 60 million actively searchable resumes • 450 globally distributed search servers (in the U.S., Europe, & the cloud) • Thousands of unique, dynamically generated search indexes • 1.5 billion search documents • 2-3 million searches an hour
  • 9. Our Search Infrastructure Feeding Stack Hadoop SQL Cassandra RabbitMQ Solr Processing Tier
  • 10. Our Search Infrastructure Query Load Balancer Solr Solr Solr Feeding Platform
  • 11.
  • 12. Our Search Platform • Generic Search API wrapping Solr + our domain stack • Goal: Abstract away search into a simple API so that any engineer can build search-based products with no prior search background • 3 Supported Methods (with rich syntax): – AddDocument – DeleteDocument – Search *users pass along their own dynamically-defined schemas on each call
  • 13. Scaling Recommendations, Semantic Search, & Data Analytics with Solr
  • 14. Business Case for Recommendations • For companies like CareerBuilder, recommendations can provide as much or even greater business value (i.e. views, sales, job applications) than user-driven search capabilities. • Recommendations create stickiness to pull users back to your company’s website, app, etc.
  • 15. Consider the information you know about your users • John lives in Boston but wants to move to New York or possibly another big city. He is currently a sales manager but wants to move towards business development. • Irene is a bartender in Dublin and is only interested in jobs within 10KM of her location in the food service industry. • Irfan is a software engineer in Atlanta and is interested in software engineering jobs at a Big Data company. He is happy to move across the U.S. for the right job. • Jane is a nurse educator in Boston seeking between $40K and $60K working in the state of Massachusetts
  • 16. Query for Jane Jane is a nurse educator in Boston seeking between $40K and $60K working in the state of Massachusetts http://localhost:8983/solr/jobs/select/? fl=jobtitle,city,state,salary& q=( jobtitle:"nurse educator"^25 OR jobtitle:(nurse educator)^10 ) AND ( (city:"Boston" AND state:"MA")^15 OR state:"MA”) AND _val_:"map(salary, 40000, 60000,10, 0)” *Example from chapter 16 of Solr in Action
  • 17. Search Results for Jane { ... "response":{"numFound":22,"start":0,"docs":[ {"jobtitle":"Clinical Educator (New England/ Boston)", "city":"Boston", "state":"MA", "salary":41503}, …]}} {"jobtitle":"Nurse Educator", "city":"Braintree", "state":"MA", "salary":56183}, {"jobtitle":"Nurse Educator", "city":"Brighton", "state":"MA", "salary":71359} *Example documents available @ https://github.com/treygrainger/solr-in-action/blob/first-edition/example-docs/ch16/
  • 18. What did we just do? • We built a recommendation engine! • What is a recommendation engine? – A system that uses known information (or derived information from that known information) to automatically suggest relevant content • Our example was just an attribute based recommendation… we’ll see that behavioral-based (i.e. collaborative filtering) is also possible.
  • 19. Redefining “Search Engine” • “Lucene is a high-performance, full-featured text search engine library…” Yes, but really… • Lucene is a high-performance, fully-featured token matching and scoring library… which can perform full-text searching.
  • 20. Redefining “Search Engine” or, in machine learning speak: • A Lucene index is multi-dimensional sparse matrix… with very fast and powerful lookup and vector multiplication capabilities. • Think of each field as a matrix containing each term mapped to each document
  • 21. The Lucene Inverted Index (traditional text example) Term Documents a doc1 [2x] brown doc3 [1x] , doc5 [1x] cat doc4 [1x] cow doc2 [1x] , doc5 [1x] … ... once doc1 [1x], doc5 [1x] over doc2 [1x], doc3 [1x] the doc2 [2x], doc3 [2x], doc4[2x], doc5 [1x] … … What you SEND to Lucene/Solr: Document Content Field doc1 once upon a time, in a land far, far away doc2 the cow jumped over the moon. doc3 the quick brown fox jumped over the lazy dog. doc4 the cat in the hat doc5 The brown cow said “moo” once. … … How the content is INDEXED into Lucene/Solr (conceptually):
  • 22. Match Text Queries to Text Fields /solr/select/?q=jobcontent:(software engineer) Job Content Field Documents … … engineer doc1, doc3, doc4, doc5 … mechanical doc2, doc4, doc6 … … software doc1, doc3, doc4, doc7, doc8 … … engineer doc5 software engineer doc1 doc3 doc4 software doc7 doc8
  • 23. Beyond Text Searching • Lucene/Solr is a search matching engine • When Lucene/Solr search text, they are matching tokens in the query with tokens in the index • Anything that can be searched upon can form the basis of matching and scoring: – text, attributes, locations, results of functions, user behavior, classifications, etc.
  • 24. Approaches to Recommendations • Content-based – Attribute-based • i.e. income level, hobbies, location, experience – Classification-based • i.e. “medical//nursing//oncology”, “animal//dog//terrier” – Textual Similarity-based • i.e. Solr’s MoreLikeThis Request Handler & Search Handler – Concept-based • i.e. Solr => “software engineer”, “java”, “search”, “open source” • Collaborative Filtering • “Users who liked that also liked this…” • Hybrid Approaches
  • 25. Collaborative Filtering What you SEND to Lucene/Solr: How the content is INDEXED into Term Documents user1 doc1, doc5 user2 doc2 user3 doc2 user4 doc1, doc3, doc4, doc5 user5 doc1, doc4 … … Document “Users who bought this product” field doc1 user1, user4, user5 doc2 user2, user3 doc3 user4 doc4 user4, user5 doc5 user4, user1 … … Lucene/Solr (conceptually):
  • 26. Step 1: Find similar users who like the same documents q=documentid: ("doc1" OR "doc4") Document “Users who bought this product” field doc1 user1, user4, user5 doc2 user2, user3 doc3 user4 doc4 user4, user5 doc5 user4, user1 … … doc1 user1 user4 user5 doc4 user4 user5 Top-scoring results (most similar users): 1) user4 (2 shared likes) 2) user5 (2 shared likes) 3) user 1 (1 shared like) *Source: Solr in Action, chapter 16
  • 27. Step 2: Search for docs “liked” by those similar users Term Documents user1 doc1, doc5 user2 doc2 user3 doc2 user4 doc1, doc3, doc4, doc5 user5 doc1, doc4 … … Top recommended documents: 1) doc1 (matches user4, user5, user1) 2) doc4 (matches user4, user5) 3) doc5 (matches user4, user1) 4) doc3 (matches user4) // doc2 does not match Most similar users: 1) user4 (2 shared likes) 2) user5 (2 shared likes) 3) user 1 (1 shared like) /solr/select/?q=userlikes:("user4"^2 OR "user5"^2 OR "user1"^1) *Source: Solr in Action, chapter 16
  • 28. Content-based Recommendations: More Like This (Query) solrconfig.xml: <requestHandler name="/mlt" class="solr.MoreLikeThisHandler" /> Query: /solr/jobs/mlt/?df=jobdescription& fl=id,jobtitle& rows=3& q=J2EE& // recommendations based on top scoring doc mlt.fl=jobtitle,jobdescription& // inspect these fields for interesting terms mlt.interestingTerms=details& // return the interesting terms mlt.boost=true *Example from chapter 16 of Solr in Action
  • 29. More Like This (Results) {"match":{"numFound":122,"start":0,"docs":[ {"id":"fc57931d42a7ccce3552c04f3db40af8dabc99dc", "jobtitle":"Senior Java / J2EE Developer"}] }, "response":{"numFound":2225,"start":0,"docs":[ {"id":"0e953179408d710679e5ddbd15ab0dfae52ffa6c", "jobtitle":"Sr Core Java Developer"}, {"id":"5ce796c758ee30ed1b3da1fc52b0595c023de2db", "jobtitle":"Applications Developer"}, {"id":"1e46dd6be1750fc50c18578b7791ad2378b90bdd", "jobtitle":"Java Architect/ Lead Java Developer - WJAV Java - Java in Pittsburgh PA"},]}, "interestingTerms":[ "jobdescription:j2ee",1.0, "jobdescription:java",0.68131137, "jobdescription:senior",0.52161527, "jobtitle:developer",0.44706684, "jobdescription:source",0.2417754, "jobdescription:code",0.17976432, "jobdescription:is",0.17765637, "jobdescription:client",0.17331646, "jobdescription:our",0.11985878, "jobdescription:for",0.07928475, "jobdescription:a",0.07875194, "jobdescription:to",0.07741922, "jobdescription:and",0.07479082]}} *Example from chapter 16 of Solr in Action
  • 30. More Like This (passing in external document) /solr/jobs/mlt/?df=jobdescription& fl=id,jobtitle& mlt.fl=jobtitle,jobdescription& mlt.interestingTerms=details& mlt.boost=true stream.body=Solr is an open source enterprise search platform from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is highly scalable. Solr is the most popular enterprise search engine. Solr 4 adds NoSQL features. *Example from chapter 16 of Solr in Action
  • 31. More Like This (Results) {"response":{"numFound":2221,"start":0,"docs":[ {"id":"eff5ac098d056a7ea6b1306986c3ae511f2d0d89 ", "jobtitle":"Enterprise Search Architect…"}, {"id":"37abb52b6fe63d601e5457641d2cf5ae83fdc799 ", "jobtitle":"Sr. Java Developer"}, {"id":"349091293478dfd3319472e920cf65657276bda4 ", "jobtitle":"Java Lucene Software Engineer"},]}, "interestingTerms":[ "jobdescription:search",1.0, "jobdescription:solr",0.9155779, "jobdescription:features",0.36472517, "jobdescription:enterprise",0.30173126, "jobdescription:is",0.17626463, "jobdescription:the",0.102924034, "jobdescription:and",0.098939896]} } *Example from chapter 16 of Solr in Action
  • 32. Understanding Our Users • Machine learning algorithms can help us understand what matters most to different groups of users. Example: Willingness to relocate for a job (miles per percentile) Software Engineers Restaurant Workers
  • 33. Search & Recommendations are on a continuum... • Why limit yourself to JUST explicit search or JUST automated recommendations? • By augmenting your user’s explicit queries with information you know about them, you can personalize their search results. • Examples: – A known software engineer runs a blank keyword search in New York… • Why not show software engineering higher in the results? – A new user runs a keyword-only search for nurse • Why not use the user’s IP address to boost documents geographically closer?
  • 34. Scaling Recommendations, Semantic Search, & Data Analytics with Solr
  • 36. Using Clustering to find semantic links
  • 37. Setting up Clustering in solrconfig.xml
  • 38. Clustering Query /solr/clustering/?q=(solr or lucene) &rows=100 &carrot.title=titlefield &carrot.snippet=titlefield &LingoClusteringAlgorithm.desiredClusterCountBase=25 //clustering & grouping don’t currently play nicely Allows you to dynamically identify “concepts” and their prevalence within a user’s top search results
  • 39. Clustering Results Original Query: q=(solr or lucene) // can be a user’s search, their job title, a list of skills, // or any other keyword rich data source Clusters Identified: Developer (22) Java Developer (13) Software (10) Senior Java Developer (9) Architect (6) Software Engineer (6) Web Developer (5) Search (3) Software Developer (3) Systems (3) Administrator (2) Hadoop Engineer (2) Java J2EE (2) Search Development (2) Software Architect (2) Solutions Architect (2) Stage 1: Identify Concepts
  • 40. Stage 2: Use Semantic Links in your relevancy calculation content:(“Developer”^22 or “Java Developer”^13 or “Software ” ^10 or “Senior Java Developer”^9 or “Architect ”^6 or “Software Engineer”^6 or “Web Developer ”^5 or “Search”^3 or “Software Developer”^3 or “Systems”^3 or “Administrator”^2 or “Hadoop Engineer”^2 or “Java J2EE”^2 or “Search Development”^2 or “Software Architect”^2 or “Solutions Architect”^2) // Your can also add the user’s location or the original keywords to the // recommendations search if it helps results quality for your use-case.
  • 41. Synonym Discovery Techniques • Our primary approach: Search Co-occurrences[1] + Point-wise Mutual Information[1] + PGMHD[2] • Strategy: Map/Reduce job which computes similar searches run for the same users John searched for “java developer” and “j2ee” Jane searched for “registered nurse” and “r.n.” and “nurse”. Zeke searched for “java developer” and “scala” and “jvm” • By mining the searches of tens millions of search terms per day, we get a list of top related searches, using multiple statistical measures. • We also tie each search term to the top category of jobs (i.e java developer, truck driver, etc.), so that we know in what context people search for each term. [1] K. Aljadda, M. Korayem, T. Grainger, C. Russell. "Crowdsourced Query Augmentation through Semantic Discovery of Domain-specific Jargon," in IEEE Big Data 2014. [2] K. Aljadda, M.Korayem, C. Ortiz, T. Grainger, J. Miller, W. York. "PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems," in IEEE Big Data 2014
  • 42. Examples of “related search terms” Example: “accounting” accountant 8880, accounts payable 5235, finance 3675, accounting clerk 3651, bookkeeper 3225, controller 2898, staff accountant 2866, accounts receivable 2842 Example: “RN”: registered nurse 6588, rn registered nurse 4300, nurse 2492, nursing 912, lpn 707, healthcare 453, rn case manager 446, registered nurse rn 404, director of nursing 321, case manager 292
  • 43. Related Keywords / Automatic Boolean Query Expansion
  • 44. Categories of related terms... Synonyms: cpa => Certified Public Accountant rn => Registered Nurse r.n. => Registered Nurse Ambiguous Terms*: driver => driver (trucking) ~80% driver => driver (software) ~20% Related Terms: r.n. => nursing, bsn hadoop => mapreduce, hive, pig *disambiguation occurs based upon context and popularity
  • 46. Scaling Recommendations, Semantic Search, & Data Analytics with Solr
  • 48. Why Solr for Analytics? • Allows “ad-hoc” querying of data by keywords • Is good at on-the-fly aggregate calculations (facets + stats + functions + grouping) • Solr is horizontally scalable, and thus able to handle billions of documents • Insanely Fast queries, encouraging user exploration
  • 49. Faceting Overview /solr/select/?q=…&facet=true //Field Faceting &facet.field=city //Range Faceting &facet.range=years_experience &facet.range.start=0 &facet.range.end=10 &facet.range.gap=1 &facet.range.other=after "facet_fields":{ "city":[ "new york, ny",2337, "los angeles, ca",1693, "chicago, il",1535, … ]} "facet_ranges":{ "years_experience":{ "counts":[ "0",1010035, "1",343831, … "9",121090 ], … "after":59462}} "facet_queries":{ "0 to 10 km":1187, "10 to 25 km":462, "25 to 50 km":794, "50+":105296 }, //Query Faceting: &facet.query={!frange key="0 to 10 km" l=0 u=10 incll=false}geodist() &facet.query={!frange key="10 to 25 km" l=10 u=25 incll=false}geodist() &facet.query={!frange key="25 to 50 km" l=25 u=50 incll=false}geodist() &facet.query={!frange key="50+" l=50 incll=false}geodist() &sfield=location &pt=37.7770,-122.4200
  • 53. Supply over Demand (Labor Pressure)
  • 54. Wait, how’d you do that?
  • 56. Building Blocks… /solr/select/? q="construction worker"& fq=city:"las vegas, nv"& facet=true& facet.field=company /solr/select/? q="construction worker"& fq=city:"las vegas, nv"& facet=true& facet.field=lastjobtitle
  • 57. Building Blocks… /solr/select/? q=...& facet=true&facet.field=experience_ranges /solr/select/?q=...&facet=true& facet.field=management_experience
  • 60. Geo-spatial Analytics Query 1: /solr/select/?... fq={!geofilt sfield=latlong pt=37.777,-122.420 d=80} &facet=true&facet.field=city& "facet_fields":{ "city":[ "san francisco, ca",11713, "san jose, ca",3071, "oakland, ca",1482, "palo alto, ca",1318, "santa clara, ca",1212, "mountain view, ca",1045, "sunnyvale, ca",1004, "fremont, ca",726, "redwood city, ca",633, Query 2: "berkeley, ca",599]} /solr/select/?... &facet=true&facet.field=city& fq=( _query_:"{!geofilt sfield=latlong pt=37.7770,-122.4200 d=20} " //san francisco OR _query_:"{!geofilt sfield=latlong pt=37.338,-121.886 d=20} " //san jose … OR _query_:"{!geofilt sfield=latlong pt=37.870,-122.271 d=20} " //berkeley )
  • 61. SOLR-2894: “Distributed Pivot Faceting” #1 Most requested Solr feature 56 Status: This feature was developed primarily by the CareerBuilder search team and committed by Chris Hostetter to the latest released version of Solr (4.10).
  • 62. SOLR-3583: “Stats within (pivot) facets” Status: We have submitted a patch (built on top of distributed pivot facets), but this will likely be replaced with SOLR-6350 + SOLR 6351 in the future.
  • 63. SOLR-3583: “Stats within (pivot) facets” /solr/select?q=...& facet=true& facet.pivot=state,city& facet.stats.percentiles=true& facet.stats.percentiles.averages=true& facet.stats.percentiles.field=compensation& f.compensation.stats.percentiles.requested=10,25,50,75,90& f.compensation.stats.percentiles.lower.fence=1000& f.compensation.stats.percentiles.upper.fence=200000& f.compensation.stats.percentiles.gap=1000 "facet_pivot":{ "state,city":[{ "field":"state", "value":"california", "count":1872280, "statistics":[ "compensation",[ "percentiles",[ "10.0","26000.0", "25.0","31000.0", "50.0","43000.0", "75.0","66000.0", "90.0","94000.0"], "percentiles_average",52613.72, "percentiles_count",1514592]], "pivot":[{ "field":"city", "value":"los angeles, ca", "count":134851, "statistics":{ "compensation":[ "percentiles",[ "10.0","26000.0", "25.0","31000.0", "50.0","45000.0", "75.0","70000.0", "90.0","95000.0"], "percentiles_average",54122.45, "percentiles_count",213481]}} … ]}]}
  • 64. Real-world Use Case Stats Pivot Stats Pivot Faceting (Percentiles) Faceting (Average) Another Pivot… Field Facet
  • 65. Key Takeaways • Traditional search & recommendations are at two ends of a continuum between user-driven and automatic matching, and Solr is really good at giving you access to that full continuum. • Searching on text is one of many forms of matching. If you can migrate to searching on behaviors, entities, and concepts, you will see much better, more personalized results. Solr is a highly-scalable platform for rapid matching across large amounts of unstructured and structured data. Performing real-time analytics at scale is not only possible, but incredibly fast and flexible.
  • 66. 2014 Publications & Presentations Books: Solr in Action - A comprehensive guide to implementing scalable search using Apache Solr Research papers: ● Towards a Job title Classification System ● Augmenting Recommendation Systems Using a Model of Semantically-related Terms Extracted from User Behavior ● sCooL: A system for academic institution name normalization ● Crowdsourced Query Augmentation through Semantic Discovery of Domain-specific jargon ● PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems ● SKILL: A System for Skill Identification and Normalization (pending publication) Speaking Engagements: ● WSDM 2014 Workshop: “Web-Scale Classification: Classifying Big Data from the Web” ● Atlanta Solr Meetup ● Atlanta Big Data Meetup ● The Second International Symposium on Big Data and Data Analytics ● Lucene/Solr Revolution 2014 ● RecSys 2014 ● IEEE Big Data Conference 2014
  • 67. Contact Info ▪ Trey Grainger trey.grainger@careerbuilder.com @treygrainger Other presentations: http://www.treygrainger.com http://solrinaction.com Meetup discount (42% off): solrmuau Yes, WE ARE HIRING @CareerBuilder. Come talk with me if you are interested…