Measuring Relevance in the Negative Space

Measuring Relevance in the
Trey Grainger
Chief Algorithms Officer, Lucidworks
@treygrainger

Trey Grainger
Chief Algorithms Officer
• Previously: SVP of Engineering @ Lucidworks; Director of Engineering @ CareerBuilder
• Georgia Tech – MBA, Management of Technology
• Furman University – BA, Computer Science, Business, & Philosophy
• Stanford University – Information Retrieval & Web Search
Other fun projects:
• Co-author of Solr in Action, plus numerous research publications
• Advisor to Presearch, the decentralized search engine
• Lucene / Solr contributor
About Me

Agenda
• Fraudulent AI
• Adversarial Machine Learning
• Cancer
• War
• Bikinis
• Brainwashing
• Alt-right
• White Supremacism
• Time Travel
• Avengers Endgame
Spoilers
• Negative Space
• Dark Data
• Pornography
• Global Warming
• Algorithmic Bias
• Diet & Exercise
• Self-crashing Cars
• Racism
• Sexism

Who are we?
230 CUSTOMERS ACROSS THE
FORTUNE 1000
400+EMPLOYEES
OFFICES IN
San Francisco, CA (HQ)
Raleigh-Durham, NC
Cambridge, UK
Bangalore, India
Hong Kong
The Search & AI Conference
COMPANY BEHIND
Employ about
40% of the active
committers on
the Solr project
40%
Contribute over
70% of Solr's open
source codebase
70%
DEVELOP & SUPPORT
Apache

The standard for
enterprise
search.
of Fortune 500
uses Solr.
90%

Industry’s most powerful
Intelligent Search & Discovery Platform.

Let the most respected
analysts in the world
speak on our behalf
Dassault Systèmes
Mindbreeze
Coveo
Microsoft
Attivio
Expert System
Smartlogic
Sinequa
IBM
IHS Markit
Funnelback
Micro Focus
COMPLETENESS OF VISION
ABILITYTOEXECUTE
CHALLENGERS LEADERS
NICHE PLAYERS VISIONARIES
Source: June 2018 Gartner Magic Quadrant report on Insight Engines.
© Gartner, Inc.

Goals of this Talk
1. Help identify patterns for uncovering overlooked
data hidden in plain sight
2. Point out current failures and dangers of
overlooking this negative space.
3. Discuss applications to my field (information
retrieval) and how my company is working to
overcome some of these failures in our own
technology.

Negative Space in Data Science
• Definition: “The missing or hidden data that gives shape to the
data you do have”
• If you think of your data within a vector space, then it’s very analogous
to negative space in art (art is just usually projected onto two
dimensions)
• “Negative” is a polysemous word. It can mean
“undesirable/bad” or it can mean “taken away/not there”.
• This talk intentionally uses both senses to make the point that
not leveraging missing or hidden data often leads to
bad/undesirable outcomes.”

Data
System Generated
Human Generated
Application Generated
Content
Index
Facet,
Topic &
Cluster
Query
Rule
Matching
Natural
Language
Machine
Learning
Boosted
Results
Signals
Search & Discovery
Customer Analytics
Digital Commerce

40%
of the S&P 500 will be extinct in 10 years

Filling in the Negative
Space
aka: connecting the dots, or traversing the knowledge graph

https://svs.gsfc.nasa.gov/30919
What is this a picture of?

Stars in the Sky Lights on a Map
Mouse Brain
with Dementia Jellyfish Larvae

https://svs.gsfc.nasa.gov/30919
Any idea?

If we zoom out a little bit…

And if we keep zooming out…
We see a map of all lights in the world

And similar patterns emerge in other
contexts…
Let’s explore airline flight patterns…

https://xkcd.com/1138/
Heatmap

Watson: “You appeared to [see a good deal] which was quite invisible to me”
Sherlock: “Not invisible but unnoticed, Watson. You did not know
where to look, and so you missed all that was important.”
The Adventures of Sherlock Holmes, ADVENTURE III. A CASE OF IDENTITY, Sir. Oliver Conan Doyle

Head?
Pipe?
Coat Collar? Back of Hat?
Hat?
Smoke?
Nose?
Abstract Concept of
Detective with Pipe
Specific hypothesis from Experience (leveraging social cue that this is probably a well-known answer)
Detective (Deerstalker) Hat!
Final Answer + conceptual context

Fighting Algorithmic Bias
aka: slapping ourselves in the face for a bit

Ok, Google…
Is Agave Nectar good for you?

…and then one day I checked again…
!

Ok, so AI can definitely be wrong,
but can it be malicious?

Racist Algorithms?
Sexist Algorithms?
Creepy Algorithms?
Negligent Algorithms?
Fraudulent Algorithms?
Malicious Algorithms?

“Adversarial Patch”, Tom P. Brown, et. al, 2017.

“Adversarial Attacks on Medical
Machine Learning”, Samuel G.
Finlayson, et. al., 2019.

Racist Algorithms?
Sexist Algorithms?

Sexist Algorithms?
Creepy Algorithms?

Manual Override By Facebook
Still Available through
Query Variations

Youtube: Relevance = “Most likely to capture attention” (ads)
Facebook: Relevance = “Most likely to capture attention” (ads)
Amazon: Relevance = “Satisfied Customer Purchases” (purchases)
Lucidworks: Relevance = “Whatever our customers want it to be…”
Why the bias?

So how can we help
our customers
avoid these pitfalls?

Search-Driven
Everything
Customer
Service
Customer
Insights
Fraud Surveillance
Research
Portal
Online Retail Digital Content

Significance of Feedback Loops
User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
Southern Data Science

Signal Boosting
User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
User Query Results
Alonzo pizza doc10,
doc22,
doc12, …
Elena soup doc84,
doc2,
doc17, …
Ming pizza doc10,
doc22,
doc12, …
… … …
User Action Document
Alonzo click doc22
Elena click doc17
Ming click doc12
Alonzo purchase doc22
Ming click doc22
Ming purchase doc22
Elena click doc2
… … …
Query Document Signal
Boost
pizza doc22 54,321
pizza doc12 987
soup doc17 1,234
soup doc2 2,345
… …
pizza ⌕
query: pizza
boost: doc22^54321
boost: doc12^987
ƒ(x) = Σ(click * click_weight * time_decay) +
Σ(purchase * purchase_weight * time_decay)
+ other_factors

• 200%+ increase in
click-through rates
• 91% lower TCO
• 50,000 fewer support
tickets
• Increased customer
satisfaction

Signal Boosting
• Benefits: dramatically improves relevance (increased conversions,
most popular documents / answers at the top)
• Risks:
• Reinforces current biases: Documents at the top already are more likely to be
clicked on / purchased / interacted with, and therefore diversity is harder to
achieve
• Solution: Learning to Rank: Learn relevance patterns and feature weights from
aggregate behavior instead of overfitting to specific documents
• Subject to Manipulation: Once users realize their behaviors (searches, clicks,
etc.) influence the ranking, they can manipulate the engine with fake actions
to boost or bury content through adversarial actions.
• Solutions:
• Session-filtering: limit to one action, per-type, per user. Further limit by IP
address, browser fingerprint, etc. if necessary
• Quality vs. Quality Weighting: For users acting on lots of queries or documents,
reduce the weight of each action proportionate to the total actions. The
more actions taken per user, the less they count toward the aggregate.

Learning to Rank (LTR)
● It applies machine learning techniques to discover the best
combination of features that provide best ranking.
● It requires labeled set of documents with relevancy scores for
given set of queries
● Features used for ranking are usually more computationally
expensive than the ones used for matching
● It typically re-ranks a subset of the matched documents (e.g. top
1000)

# Run Searches
http://localhost:8983/solr/techproducts/select?q=ipod

# Supply User Relevancy Judgements
nano contrib/ltr/example/user_queries.txt
#Format: query | doc id | relevancy judgement | source
# Train and Upload Model
./train_and_upload_demo_model.py -c config.json

# Re-run Searches using Machine-learned Ranking Model
http://localhost:8984/solr/techproducts/browse?q=ipod
&rq={!ltr model=exampleModel reRankDocs=100 efi.user_query=$q}

Collaborative Filtering (Recommendations)
User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
User Query Results
Alonzo pizza doc10,
doc22,
doc12, …
Elena soup doc84,
doc2,
doc17, …
Ming pizza doc10,
doc22,
doc12, …
… … …
Alonzo click doc22
Elena click doc17
Ming click doc12
Ming click doc22
Ming purchase doc12
Elena click doc2
… … …
User Item Weight
Alonzo doc22 1.0
Alonzo doc12 0.4
… … …
Ming doc12 0.9
Ming doc22 0.6
… … …
pizza ⌕
Matrix Factorization
Recommendations for Alonzo:
• doc22: “Peperoni Pizza”
• doc12: “Cheese Pizza”
…

Collaborative Filtering
• Benefits: crowd-sources related content discovery based on real user
interactions with no a-priori understanding of the content required
• Risks:
• Reinforces biases: People interact with what they are recommended, so those
same items get recommended to the next person ad-infinitum
• Solutions:
• Combine with Content-based Features: Multi-modal recommendations enable
mixing non-behavior-based matches and overcome the cold-start problem
• Only Count Explicit Actions: If content is on “autoplay”, don’t assume an
interaction is positive. Only count explicit clicks, likes, dislikes, etc.
• Inject Conceptual Diversity: Use techniques like concept clustering or the
Semantic Knowledge Graph to determine key conceptual differences between
content, and ensure results coming back represent diverse viewpoints and not
just identical ones.
• Subject to Manipulation: Same concerns as signals boosting
• Solutions: Same solutions as Signals Boosting (Session-filtering,
Quality vs. Quality Weighting)

What is the Negative Space
between two words?

What’s in the Negative Space Between
the words “Jean Grey” and “In Love”?
Jean
Grey
In Love

Content-based Recommendations
http://localhost:8983/solr/job-postings/skg

Scoring of Node Relationships (Edge Weights)
Foreground vs. Background Analysis
Every term scored against it’s context. The more
commonly the term appears within it’s foreground
context versus its background context, the more
relevant it is to the specified foreground context.
countFG(x) - totalDocsFG * probBG(x)
z = --------------------------------------------------------
sqrt(totalDocsFG * probBG(x) * (1 - probBG(x)))
{ "type":"keywords”, "values":[
{ "value":"hive", "relatedness":0.9773, "popularity":369 },
{ "value":"java", "relatedness":0.9236, "popularity":15653 },
{ "value":".net", "relatedness":0.5294, "popularity":17683 },
{ "value":"bee", "relatedness":0.0, "popularity":0 },
{ "value":"teacher", "relatedness":-0.2380, "popularity":9923 },
{ "value":"registered nurse", "relatedness": -0.3802 "popularity":27089 } ] }
We are essentially boosting terms which are more related to some known feature
(and ignoring terms which are equally likely to appear in the background corpus)
+
-
Foreground Query:
"Hadoop"
Knowledge
Graph

Techniques like the
Semantic Knowledge Graph
can be used to score
“diversity” across content,
which can aid in reducing
the bias of Signals and
Collaborative Filtering.

So, can we go back in time and fix our mistakes?

No, but we do have a wizard….

User
Searches
User
Sees
Results
User
takes an
action
Well, today, most of us run
A/B experiments to test hypothesis to
“limit” the unknown negative impact
to a subset of users

What if we could use the negative space
to view alternate futures…
…and then make only the specific choices
that will achieve the desired outcomes

Imagine if we could
simulate user interactions to
changes before having to expose
real users to those changes?

User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
User Query Results
Alonz
o
pizza doc10,
doc22,
doc12, …
Elena soup doc84,
doc2,
doc17, …
Ming pizza doc10,
doc22,
doc12, …
… … …
Alonzo click doc22
Elena click doc17
Ming click doc10
Ming click doc22
Ming purchase doc22
Elena click doc2
… … …
We DO have historical user behavior,
but it’s biased to the current
algorithm...
The click and purchase
counts are all higher
for docs that are already
ranked higher, since
they’re seen more often…

User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
User Query Results
Alonz
o
pizza doc10,
doc22,
doc12, …
Elena soup doc84,
doc2,
doc17, …
Ming pizza doc10,
doc22,
doc12, …
… … …
Alonzo click doc22
Elena click doc17
Ming click doc10
Ming click doc22
Ming purchase doc22
Elena click doc2
… … …
What other data do we have available
that we’re not leveraging?

User Query Results
Alonz
o
pizza doc10,
doc22,
doc12, …
Elena soup doc84,
doc2,
doc17, …
Ming pizza doc10,
doc22,
doc12, …
… … …
Alonzo click doc22
Elena click doc17
Ming click doc10
Ming click doc22
Ming purchase doc22
Elena click doc2
… … …
What we already know:
• What the user searched
• What the user interacted with (click,
purchase)
• Results returned to the user
What would we ideally like to know?
• Which documents are relevant (user liked)
• Which documents are irrelevant (user
didn’t like)
• What is the ideal ranking of documents?
Can we use the Negative Space to connect the dots?

How to infer relevance?
Rank Document ID
1 Doc1
2 Doc2
3 Doc3
4 Doc4
Query
Query
Doc1 Doc2 Doc3
0
1 1
Query
Doc1 Doc2 Doc3
1
0 0
Click Graph
Skip Graph
?

From this click-skip graph, we
can generate a ground truth
data set mapping known
queries to an ideal ranking
of documents.

How to Measure Relevance?
A B C
Retrieved
Documents
Relevant
Documents
Precision = B / A
Recall = B / C
Problem:
Assume Prec = 90% and Rec = 100% but assume the 10% irrelevant documents were ranked at
the top of the retrieved documents, is that OK?

Discounted Cumulative Gain
Rank Relevancy
1 0.95
2 0.65
3 0.80
4 0.85
Rank Relevancy
1 0.95
2 0.65
3 0.80
4 0.85
Ranking
Ideal
Given
• Position is
considered in
quantifying
relevancy.
• Labeled dataset
is required.

User Query Results
Alonz
o
pizza doc10,
doc22,
doc12, …
Elena soup doc84,
doc2,
doc17, …
Ming pizza doc10,
doc22,
doc12, …
… … …
Alonzo click doc22
Elena click doc17
Ming click doc10
Ming click doc22
Ming purchase doc22
Elena click doc2
… … …
Relevance Backtesting Simulation

Did we cover our Agenda?
• Fraudulent AI
• Adversarial Machine Learning
• Cancer
• War
• Bikinis
• Brainwashing
• Alt-right
• White Supremacism
• Time Travel
• Avengers Endgame
Spoilers
• Negative Space
• Dark Data
• Pornography
• Global Warming
• Algorithmic Bias
• Diet & Exercise
• Self-crashing Cars
• Racism
• Sexism

Trey Grainger
trey.grainger@lucidworks.com
@treygrainger
Thank you!
http://solrinaction.com
Other presentations:
http://www.treygrainger.com
Discount code: ctwdsc19
Book Signing
3:00 pm today!
(coffee break)
@ Registration Desk

Measuring Relevance in the Negative Space

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Measuring Relevance in the Negative Space

Similar to Measuring Relevance in the Negative Space (20)

Recently uploaded

Recently uploaded (20)

Measuring Relevance in the Negative Space