SlideShare a Scribd company logo
1 of 23
Download to read offline
Personalized Search
Building a prototype to infer the user's interest
Tom Burgmans
Technology Product Owner Search
Wolters Kluwer
April 28th 2022
2
Make search better through Personalization….a controversial topic
If you know the users,
you could improve
their search
There is too little
traffic to build user
profiles
Our competitors claim
to have it too…
Our users don’t need it,
they work on different
cases every day
Our customers expect
our search to learn
what they need
Our search is already
personal…via
subscription filters
This could destroy
the user’s trust
3
4
Personalized search is web based search results that are tailored
specifically to an individual's interests by incorporating information
about the individual beyond the specific query provided.
Wikipedia says
What could make search “personal”
Search
bonus depreciation
Activity
• Past queries
• Document interactions
• Saved preferences
• Filter actions
Demographics
• Nationality
• Gender
• Age
• Profession
Context
• Location
• Date/time
• Device
• Weather
Social
• Network connections
• Loyalty
• Shared information
Factors to infer the user’s interest
5
What could make search “personal” (this PoC)
Search
bonus depreciation
Activity
• Past queries
• Document interactions
• Saved preferences
• Filter actions
Context
Demographics
Social
• Nationality
• Gender
• Age
• Profession
• Location
• Date/time
• Device
• Weather
• Network connections
• Loyalty
• Shared information
Factors to infer the user’s interest
6
Hypothesis
The basis for Personalized Search is a Recommendation Engine which is based on an index of
user activity that allows to infer the user’s interest via collaborative filtering.
likes:
Recommended
likes:
Similar
interests:
like
s:
like
s:
likes:
likes:
Everybody
else:
????
7
Collaborative filtering & anomaly detection
• Find users with similar interests
• Recommend items with an unusually high presence in this group compared to the rest
likes:
Similar
interests:
likes: likes: likes: likes: likes:
8
very similar users not so similar
Collaborative filtering & selecting the foreground
• Find only users with very similar interests
• Balance between similarity and volume
9
foreground background
10
What is the ideal time window to infer the user’s interest?
year
“session”
1
2
3
4
5
last N actions
11
1 2 3
4 5
Example extremely temporary personalization
12
How to apply the inferred user’s interest?
personal
recommendations
query
boost
results
personal
recommendations
query
filter
results
personal
recommendations
query
results
results
personal
recommendations
alerts
as a boost:
as a filter:
personal
recommendations
query
autocomplete
as query suggestions:
as a separate result set:
as alerts:
personal
recommendations
results
as interesting items:
Goals of this PoC
How to build it?
What can we do on our existing technology stack?
In what cases to apply it?
Where is the need for personalization? What form is needed?
What data do we need?
Define ‘additional’ data sources; cleaning of noise
How to measure success?
Can we test/tune it offline? How to A/B test online?
• Technology
• Need
• Data
• Verification
13
14
Finding a technology for anomaly detection
JLH score in Elastic = (fg_percentage - bg_percentage) * (fg_percentage / bg_percentage)
foreground background
Significant Terms
15
Finding a technology for anomaly detection
Solr's SignificantTerms score = (log(freq_foreground)+1) * (log( (numDocs + 1)/(freq_background + 1) ) + 1)
foreground background
Significant Terms
(SignificantTermsStream.java)
Stream source
16
Finding a technology for anomaly detection
fq_count - fg_size * bg_prob
Z-score = ---------------------------------------
sqrt(fg_size * bg_prob * (1 - bg_prob))
foreground background
Reletedness() JSON Facet function
bg_prob = bg_count / bg_size
fg_prob = fg_count / fg_size
a.k.a. Semantic Knowledge Graph
(RelatednessAgg.java)
Selecting similar users
Wrapping mlt into mltplus
very similar users not so similar
17
• MLT standard Solr query parser to find all similar documents
• MLTPLUS home made query parser to only keep documents that are very similar.
18
Design recommendation engine
Source: Wolters Kluwer Navigator 2021 customer usage data.
Recommendation engine
88.208 users
Year’s activity
per user
• documents
• publications
• practice areas
• document types
• queries
recommends
given a user’s
year
• userid (encrypted)
• Time stamp
• behavior:
• # of doc views
• Viewed documents
• Publications
• doc types
• practice areas
• The strongly liked documents
• Favorites
• Prints
• Downloads
• email
• Queries
• The applied filters
Recommendation engine
2.582.900 sessions
Activity
per session
• documents
• publications
• practice areas
• document types
• queries
recommends
given a user’s
(previous) session
given a user’s
last N actions
DWH
usage
19
88.208 users
Recommendation engine
2.582.900 sessions
Personalization Service
Demo app
user’s (last) year ID
user’s (last) session ID
user’s current actions
Recommended
• documents
• publications
• practice areas
• document types
• queries
Personalization prototype
content of personal folder
long-term personalization
short-term personalization
Very short-term personalization
20
What is the need for personalization based on a year history ?
32% users > 20 actions / year 68% users < 20 actions / year
From the 32% most active users:
66% have viewed (relatively)
consistently documents of the
same 2 practice areas.
So for 21% of all users
recommending PAs based on the
activity of past year might make
sense.
How consistent is the user’s interest through the year?
21
What is the need for personalization based on the last session ?
For 42% of the users the
previous session has a 0.9-1.0
consine overlap with the
current one w.r.t. PA interest.
This rises to 66% when the time
between sessions get shorter.
How much does the last session predict the interest of the next one?
22
How to measure success?
• Offline:
• Manual tests by a domain expert
• Build a model that predicts the likelihood that the previous session predicts the
interest of the current one.
• Replay user sessions and guestimate how much shorter they could have been in
case personalization would have been applied.
• Online:
• Measure clicks on explicit recommendations
• A/B test (in case recommendations applied as boosts/filters)
23
Lessons learned
The accuracy of the
recommendations
depends on the
purity/completeness
of the usage logs.
Preferred: front-end
usage logging
Quality logs needed
Improvement
opportunities:
- Use frond-end logs
- Better data cleaning
- Query normalization
- Add more signals like
filters &
autocomplete clicks
- Testing strategies
We’re not done yet
Offline testing of
personalization is
hard, but slightly
easier for the short-
term variant.
Online testing takes
time.
Challenge: testing
Short-term
personalization is
applicable to more
users than long-term
personalization
(may differ per
business case)
What works best
We can indeed infer
the user’s interest via
collaborative filtering
of user actions, using
Solr and using usage
data from a large
Wolters Kluwer
product
Yes we can

More Related Content

Similar to Personalized Search-Building a prototype to infer the user's interest

Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and ProfitLouis Rosenfeld
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Marianne Sweeny
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Marianne Sweeny
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
CUTGroup Detroit Slides for CUTGroup Collective Call
CUTGroup Detroit Slides for CUTGroup Collective CallCUTGroup Detroit Slides for CUTGroup Collective Call
CUTGroup Detroit Slides for CUTGroup Collective CallSmart Chicago Collaborative
 
Data-Driven Design for User Experience
Data-Driven Design for User Experience Data-Driven Design for User Experience
Data-Driven Design for User Experience Emi Kwon
 
The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...Pieter De Leenheer
 
Agent technology for e commerce-recommendation systems
Agent technology for e commerce-recommendation systemsAgent technology for e commerce-recommendation systems
Agent technology for e commerce-recommendation systemsAravindharamanan S
 
Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)
Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)
Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)webdagene
 
Site Search Analytics in a Nutshell
Site Search Analytics in a NutshellSite Search Analytics in a Nutshell
Site Search Analytics in a NutshellLouis Rosenfeld
 
Discovering Temporal Hidden Contexts in Web Sessions for User Trail Prediction
Discovering Temporal Hidden Contexts in Web Sessions for User Trail PredictionDiscovering Temporal Hidden Contexts in Web Sessions for User Trail Prediction
Discovering Temporal Hidden Contexts in Web Sessions for User Trail PredictionJulia Kiseleva
 
Context Mining and Integration in Web Predictive Analytics
Context Mining and Integration in Web Predictive AnalyticsContext Mining and Integration in Web Predictive Analytics
Context Mining and Integration in Web Predictive AnalyticsJulia Kiseleva
 
Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Matt Dusig
 
Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Matt Dusig
 
User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks Shah Alam Sabuj
 
The talk at Twente University on 28 July 2014
The talk at Twente University on 28 July 2014 The talk at Twente University on 28 July 2014
The talk at Twente University on 28 July 2014 Julia Kiseleva
 

Similar to Personalized Search-Building a prototype to infer the user's interest (20)

Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and Profit
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Tech Essentials - UP Edition
Tech Essentials - UP EditionTech Essentials - UP Edition
Tech Essentials - UP Edition
 
CUTGroup Detroit Slides for CUTGroup Collective Call
CUTGroup Detroit Slides for CUTGroup Collective CallCUTGroup Detroit Slides for CUTGroup Collective Call
CUTGroup Detroit Slides for CUTGroup Collective Call
 
Data-Driven Design for User Experience
Data-Driven Design for User Experience Data-Driven Design for User Experience
Data-Driven Design for User Experience
 
The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...
 
Agent technology for e commerce-recommendation systems
Agent technology for e commerce-recommendation systemsAgent technology for e commerce-recommendation systems
Agent technology for e commerce-recommendation systems
 
Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)
Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)
Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)
 
Site Search Analytics in a Nutshell
Site Search Analytics in a NutshellSite Search Analytics in a Nutshell
Site Search Analytics in a Nutshell
 
Ch 3
Ch   3Ch   3
Ch 3
 
Discovering Temporal Hidden Contexts in Web Sessions for User Trail Prediction
Discovering Temporal Hidden Contexts in Web Sessions for User Trail PredictionDiscovering Temporal Hidden Contexts in Web Sessions for User Trail Prediction
Discovering Temporal Hidden Contexts in Web Sessions for User Trail Prediction
 
Context Mining and Integration in Web Predictive Analytics
Context Mining and Integration in Web Predictive AnalyticsContext Mining and Integration in Web Predictive Analytics
Context Mining and Integration in Web Predictive Analytics
 
Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!
 
Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!
 
User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks
 
The talk at Twente University on 28 July 2014
The talk at Twente University on 28 July 2014 The talk at Twente University on 28 July 2014
The talk at Twente University on 28 July 2014
 
Role of Data Science in eCommerce
Role of Data Science in eCommerceRole of Data Science in eCommerce
Role of Data Science in eCommerce
 

Recently uploaded

If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMoumonDas2
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsaqsarehman5055
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 

Recently uploaded (20)

If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 

Personalized Search-Building a prototype to infer the user's interest

  • 1. Personalized Search Building a prototype to infer the user's interest Tom Burgmans Technology Product Owner Search Wolters Kluwer April 28th 2022
  • 2. 2
  • 3. Make search better through Personalization….a controversial topic If you know the users, you could improve their search There is too little traffic to build user profiles Our competitors claim to have it too… Our users don’t need it, they work on different cases every day Our customers expect our search to learn what they need Our search is already personal…via subscription filters This could destroy the user’s trust 3
  • 4. 4 Personalized search is web based search results that are tailored specifically to an individual's interests by incorporating information about the individual beyond the specific query provided. Wikipedia says
  • 5. What could make search “personal” Search bonus depreciation Activity • Past queries • Document interactions • Saved preferences • Filter actions Demographics • Nationality • Gender • Age • Profession Context • Location • Date/time • Device • Weather Social • Network connections • Loyalty • Shared information Factors to infer the user’s interest 5
  • 6. What could make search “personal” (this PoC) Search bonus depreciation Activity • Past queries • Document interactions • Saved preferences • Filter actions Context Demographics Social • Nationality • Gender • Age • Profession • Location • Date/time • Device • Weather • Network connections • Loyalty • Shared information Factors to infer the user’s interest 6
  • 7. Hypothesis The basis for Personalized Search is a Recommendation Engine which is based on an index of user activity that allows to infer the user’s interest via collaborative filtering. likes: Recommended likes: Similar interests: like s: like s: likes: likes: Everybody else: ???? 7
  • 8. Collaborative filtering & anomaly detection • Find users with similar interests • Recommend items with an unusually high presence in this group compared to the rest likes: Similar interests: likes: likes: likes: likes: likes: 8
  • 9. very similar users not so similar Collaborative filtering & selecting the foreground • Find only users with very similar interests • Balance between similarity and volume 9 foreground background
  • 10. 10 What is the ideal time window to infer the user’s interest? year “session” 1 2 3 4 5 last N actions
  • 11. 11 1 2 3 4 5 Example extremely temporary personalization
  • 12. 12 How to apply the inferred user’s interest? personal recommendations query boost results personal recommendations query filter results personal recommendations query results results personal recommendations alerts as a boost: as a filter: personal recommendations query autocomplete as query suggestions: as a separate result set: as alerts: personal recommendations results as interesting items:
  • 13. Goals of this PoC How to build it? What can we do on our existing technology stack? In what cases to apply it? Where is the need for personalization? What form is needed? What data do we need? Define ‘additional’ data sources; cleaning of noise How to measure success? Can we test/tune it offline? How to A/B test online? • Technology • Need • Data • Verification 13
  • 14. 14 Finding a technology for anomaly detection JLH score in Elastic = (fg_percentage - bg_percentage) * (fg_percentage / bg_percentage) foreground background Significant Terms
  • 15. 15 Finding a technology for anomaly detection Solr's SignificantTerms score = (log(freq_foreground)+1) * (log( (numDocs + 1)/(freq_background + 1) ) + 1) foreground background Significant Terms (SignificantTermsStream.java) Stream source
  • 16. 16 Finding a technology for anomaly detection fq_count - fg_size * bg_prob Z-score = --------------------------------------- sqrt(fg_size * bg_prob * (1 - bg_prob)) foreground background Reletedness() JSON Facet function bg_prob = bg_count / bg_size fg_prob = fg_count / fg_size a.k.a. Semantic Knowledge Graph (RelatednessAgg.java)
  • 17. Selecting similar users Wrapping mlt into mltplus very similar users not so similar 17 • MLT standard Solr query parser to find all similar documents • MLTPLUS home made query parser to only keep documents that are very similar.
  • 18. 18 Design recommendation engine Source: Wolters Kluwer Navigator 2021 customer usage data. Recommendation engine 88.208 users Year’s activity per user • documents • publications • practice areas • document types • queries recommends given a user’s year • userid (encrypted) • Time stamp • behavior: • # of doc views • Viewed documents • Publications • doc types • practice areas • The strongly liked documents • Favorites • Prints • Downloads • email • Queries • The applied filters Recommendation engine 2.582.900 sessions Activity per session • documents • publications • practice areas • document types • queries recommends given a user’s (previous) session given a user’s last N actions DWH usage
  • 19. 19 88.208 users Recommendation engine 2.582.900 sessions Personalization Service Demo app user’s (last) year ID user’s (last) session ID user’s current actions Recommended • documents • publications • practice areas • document types • queries Personalization prototype content of personal folder long-term personalization short-term personalization Very short-term personalization
  • 20. 20 What is the need for personalization based on a year history ? 32% users > 20 actions / year 68% users < 20 actions / year From the 32% most active users: 66% have viewed (relatively) consistently documents of the same 2 practice areas. So for 21% of all users recommending PAs based on the activity of past year might make sense. How consistent is the user’s interest through the year?
  • 21. 21 What is the need for personalization based on the last session ? For 42% of the users the previous session has a 0.9-1.0 consine overlap with the current one w.r.t. PA interest. This rises to 66% when the time between sessions get shorter. How much does the last session predict the interest of the next one?
  • 22. 22 How to measure success? • Offline: • Manual tests by a domain expert • Build a model that predicts the likelihood that the previous session predicts the interest of the current one. • Replay user sessions and guestimate how much shorter they could have been in case personalization would have been applied. • Online: • Measure clicks on explicit recommendations • A/B test (in case recommendations applied as boosts/filters)
  • 23. 23 Lessons learned The accuracy of the recommendations depends on the purity/completeness of the usage logs. Preferred: front-end usage logging Quality logs needed Improvement opportunities: - Use frond-end logs - Better data cleaning - Query normalization - Add more signals like filters & autocomplete clicks - Testing strategies We’re not done yet Offline testing of personalization is hard, but slightly easier for the short- term variant. Online testing takes time. Challenge: testing Short-term personalization is applicable to more users than long-term personalization (may differ per business case) What works best We can indeed infer the user’s interest via collaborative filtering of user actions, using Solr and using usage data from a large Wolters Kluwer product Yes we can