The document discusses using negative space, or hidden or missing data, to improve machine learning and algorithmic systems by connecting related concepts that may not be explicitly linked. It provides examples of how analyzing relationships between terms in a semantic knowledge graph can lead to more diverse and less biased recommendations and search results. The talk argues that simulating hypothetical user interactions could help identify potential issues with algorithm changes before exposing real users.
WordPress Websites for Engineers: Elevate Your Brand
Measuring Relevance in the Negative Space
1. Measuring Relevance in the
Trey Grainger
Chief Algorithms Officer, Lucidworks
@treygrainger
2. Trey Grainger
Chief Algorithms Officer
• Previously: SVP of Engineering @ Lucidworks; Director of Engineering @ CareerBuilder
• Georgia Tech – MBA, Management of Technology
• Furman University – BA, Computer Science, Business, & Philosophy
• Stanford University – Information Retrieval & Web Search
Other fun projects:
• Co-author of Solr in Action, plus numerous research publications
• Advisor to Presearch, the decentralized search engine
• Lucene / Solr contributor
About Me
3. Agenda
• Fraudulent AI
• Adversarial Machine Learning
• Cancer
• War
• Bikinis
• Brainwashing
• Alt-right
• White Supremacism
• Time Travel
• Avengers Endgame
Spoilers
• Negative Space
• Dark Data
• Pornography
• Global Warming
• Algorithmic Bias
• Diet & Exercise
• Self-crashing Cars
• Racism
• Sexism
4. Who are we?
230 CUSTOMERS ACROSS THE
FORTUNE 1000
400+EMPLOYEES
OFFICES IN
San Francisco, CA (HQ)
Raleigh-Durham, NC
Cambridge, UK
Bangalore, India
Hong Kong
The Search & AI Conference
COMPANY BEHIND
Employ about
40% of the active
committers on
the Solr project
40%
Contribute over
70% of Solr's open
source codebase
70%
DEVELOP & SUPPORT
Apache
9. Goals of this Talk
1. Help identify patterns for uncovering overlooked
data hidden in plain sight
2. Point out current failures and dangers of
overlooking this negative space.
3. Discuss applications to my field (information
retrieval) and how my company is working to
overcome some of these failures in our own
technology.
12. Negative Space in Data Science
• Definition: “The missing or hidden data that gives shape to the
data you do have”
• If you think of your data within a vector space, then it’s very analogous
to negative space in art (art is just usually projected onto two
dimensions)
• “Negative” is a polysemous word. It can mean
“undesirable/bad” or it can mean “taken away/not there”.
• This talk intentionally uses both senses to make the point that
not leveraging missing or hidden data often leads to
bad/undesirable outcomes.”
13.
14.
15.
16.
17.
18.
19. Data
System Generated
Human Generated
Application Generated
Content
Index
Facet,
Topic &
Cluster
Query
Rule
Matching
Natural
Language
Machine
Learning
Boosted
Results
Signals
Search & Discovery
Customer Analytics
Digital Commerce
36. Watson: “You appeared to [see a good deal] which was quite invisible to me”
Sherlock: “Not invisible but unnoticed, Watson. You did not know
where to look, and so you missed all that was important.”
The Adventures of Sherlock Holmes, ADVENTURE III. A CASE OF IDENTITY, Sir. Oliver Conan Doyle
37. Head?
Pipe?
Coat Collar? Back of Hat?
Hat?
Smoke?
Nose?
Abstract Concept of
Detective with Pipe
Specific hypothesis from Experience (leveraging social cue that this is probably a well-known answer)
Detective (Deerstalker) Hat!
Final Answer + conceptual context
75. Significance of Feedback Loops
User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
Southern Data Science
76. Signal Boosting
User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
User Query Results
Alonzo pizza doc10,
doc22,
doc12, …
Elena soup doc84,
doc2,
doc17, …
Ming pizza doc10,
doc22,
doc12, …
… … …
User Action Document
Alonzo click doc22
Elena click doc17
Ming click doc12
Alonzo purchase doc22
Ming click doc22
Ming purchase doc22
Elena click doc2
… … …
Query Document Signal
Boost
pizza doc22 54,321
pizza doc12 987
soup doc17 1,234
soup doc2 2,345
… …
pizza ⌕
query: pizza
boost: doc22^54321
boost: doc12^987
ƒ(x) = Σ(click * click_weight * time_decay) +
Σ(purchase * purchase_weight * time_decay)
+ other_factors
81. • 200%+ increase in
click-through rates
• 91% lower TCO
• 50,000 fewer support
tickets
• Increased customer
satisfaction
82. Signal Boosting
• Benefits: dramatically improves relevance (increased conversions,
most popular documents / answers at the top)
• Risks:
• Reinforces current biases: Documents at the top already are more likely to be
clicked on / purchased / interacted with, and therefore diversity is harder to
achieve
• Solution: Learning to Rank: Learn relevance patterns and feature weights from
aggregate behavior instead of overfitting to specific documents
• Subject to Manipulation: Once users realize their behaviors (searches, clicks,
etc.) influence the ranking, they can manipulate the engine with fake actions
to boost or bury content through adversarial actions.
• Solutions:
• Session-filtering: limit to one action, per-type, per user. Further limit by IP
address, browser fingerprint, etc. if necessary
• Quality vs. Quality Weighting: For users acting on lots of queries or documents,
reduce the weight of each action proportionate to the total actions. The
more actions taken per user, the less they count toward the aggregate.
83. Learning to Rank (LTR)
● It applies machine learning techniques to discover the best
combination of features that provide best ranking.
● It requires labeled set of documents with relevancy scores for
given set of queries
● Features used for ranking are usually more computationally
expensive than the ones used for matching
● It typically re-ranks a subset of the matched documents (e.g. top
1000)
85. # Supply User Relevancy Judgements
nano contrib/ltr/example/user_queries.txt
#Format: query | doc id | relevancy judgement | source
# Train and Upload Model
./train_and_upload_demo_model.py -c config.json
86. # Re-run Searches using Machine-learned Ranking Model
http://localhost:8984/solr/techproducts/browse?q=ipod
&rq={!ltr model=exampleModel reRankDocs=100 efi.user_query=$q}
87. Collaborative Filtering (Recommendations)
User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
User Query Results
Alonzo pizza doc10,
doc22,
doc12, …
Elena soup doc84,
doc2,
doc17, …
Ming pizza doc10,
doc22,
doc12, …
… … …
User Action Document
Alonzo click doc22
Elena click doc17
Ming click doc12
Alonzo purchase doc22
Ming click doc22
Ming purchase doc12
Elena click doc2
… … …
User Item Weight
Alonzo doc22 1.0
Alonzo doc12 0.4
… … …
Ming doc12 0.9
Ming doc22 0.6
… … …
pizza ⌕
Matrix Factorization
Recommendations for Alonzo:
• doc22: “Peperoni Pizza”
• doc12: “Cheese Pizza”
…
88. Collaborative Filtering
• Benefits: crowd-sources related content discovery based on real user
interactions with no a-priori understanding of the content required
• Risks:
• Reinforces biases: People interact with what they are recommended, so those
same items get recommended to the next person ad-infinitum
• Solutions:
• Combine with Content-based Features: Multi-modal recommendations enable
mixing non-behavior-based matches and overcome the cold-start problem
• Only Count Explicit Actions: If content is on “autoplay”, don’t assume an
interaction is positive. Only count explicit clicks, likes, dislikes, etc.
• Inject Conceptual Diversity: Use techniques like concept clustering or the
Semantic Knowledge Graph to determine key conceptual differences between
content, and ensure results coming back represent diverse viewpoints and not
just identical ones.
• Subject to Manipulation: Same concerns as signals boosting
• Solutions: Same solutions as Signals Boosting (Session-filtering,
Quality vs. Quality Weighting)
93. Scoring of Node Relationships (Edge Weights)
Foreground vs. Background Analysis
Every term scored against it’s context. The more
commonly the term appears within it’s foreground
context versus its background context, the more
relevant it is to the specified foreground context.
countFG(x) - totalDocsFG * probBG(x)
z = --------------------------------------------------------
sqrt(totalDocsFG * probBG(x) * (1 - probBG(x)))
{ "type":"keywords”, "values":[
{ "value":"hive", "relatedness":0.9773, "popularity":369 },
{ "value":"java", "relatedness":0.9236, "popularity":15653 },
{ "value":".net", "relatedness":0.5294, "popularity":17683 },
{ "value":"bee", "relatedness":0.0, "popularity":0 },
{ "value":"teacher", "relatedness":-0.2380, "popularity":9923 },
{ "value":"registered nurse", "relatedness": -0.3802 "popularity":27089 } ] }
We are essentially boosting terms which are more related to some known feature
(and ignoring terms which are equally likely to appear in the background corpus)
+
-
Foreground Query:
"Hadoop"
Knowledge
Graph
94. Techniques like the
Semantic Knowledge Graph
can be used to score
“diversity” across content,
which can aid in reducing
the bias of Signals and
Collaborative Filtering.
95. So, can we go back in time and fix our mistakes?
98. What if we could use the negative space
to view alternate futures…
…and then make only the specific choices
that will achieve the desired outcomes
99. Imagine if we could
simulate user interactions to
changes before having to expose
real users to those changes?
100. User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
User Query Results
Alonz
o
pizza doc10,
doc22,
doc12, …
Elena soup doc84,
doc2,
doc17, …
Ming pizza doc10,
doc22,
doc12, …
… … …
User Action Document
Alonzo click doc22
Elena click doc17
Ming click doc10
Alonzo purchase doc22
Ming click doc22
Ming purchase doc22
Elena click doc2
… … …
We DO have historical user behavior,
but it’s biased to the current
algorithm...
The click and purchase
counts are all higher
for docs that are already
ranked higher, since
they’re seen more often…
101. User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
User Query Results
Alonz
o
pizza doc10,
doc22,
doc12, …
Elena soup doc84,
doc2,
doc17, …
Ming pizza doc10,
doc22,
doc12, …
… … …
User Action Document
Alonzo click doc22
Elena click doc17
Ming click doc10
Alonzo purchase doc22
Ming click doc22
Ming purchase doc22
Elena click doc2
… … …
What other data do we have available
that we’re not leveraging?
102. User Query Results
Alonz
o
pizza doc10,
doc22,
doc12, …
Elena soup doc84,
doc2,
doc17, …
Ming pizza doc10,
doc22,
doc12, …
… … …
User Action Document
Alonzo click doc22
Elena click doc17
Ming click doc10
Alonzo purchase doc22
Ming click doc22
Ming purchase doc22
Elena click doc2
… … …
What we already know:
• What the user searched
• What the user interacted with (click,
purchase)
• Results returned to the user
What would we ideally like to know?
• Which documents are relevant (user liked)
• Which documents are irrelevant (user
didn’t like)
• What is the ideal ranking of documents?
Can we use the Negative Space to connect the dots?
105. From this click-skip graph, we
can generate a ground truth
data set mapping known
queries to an ideal ranking
of documents.
106. How to Measure Relevance?
A B C
Retrieved
Documents
Relevant
Documents
Precision = B / A
Recall = B / C
Problem:
Assume Prec = 90% and Rec = 100% but assume the 10% irrelevant documents were ranked at
the top of the retrieved documents, is that OK?
107. Discounted Cumulative Gain
Rank Relevancy
1 0.95
2 0.65
3 0.80
4 0.85
Rank Relevancy
1 0.95
2 0.65
3 0.80
4 0.85
Ranking
Ideal
Given
• Position is
considered in
quantifying
relevancy.
• Labeled dataset
is required.
108. User Query Results
Alonz
o
pizza doc10,
doc22,
doc12, …
Elena soup doc84,
doc2,
doc17, …
Ming pizza doc10,
doc22,
doc12, …
… … …
User Action Document
Alonzo click doc22
Elena click doc17
Ming click doc10
Alonzo purchase doc22
Ming click doc22
Ming purchase doc22
Elena click doc2
… … …
Relevance Backtesting Simulation
109.
110. Did we cover our Agenda?
• Fraudulent AI
• Adversarial Machine Learning
• Cancer
• War
• Bikinis
• Brainwashing
• Alt-right
• White Supremacism
• Time Travel
• Avengers Endgame
Spoilers
• Negative Space
• Dark Data
• Pornography
• Global Warming
• Algorithmic Bias
• Diet & Exercise
• Self-crashing Cars
• Racism
• Sexism
111. Goals of this Talk
1. Help identify patterns for uncovering overlooked
data hidden in plain sight
2. Point out current failures and dangers of
overlooking this negative space.
3. Discuss applications to my field (information
retrieval) and how my company is working to
overcome some of these failures in our own
technology.