SlideShare a Scribd company logo
1 of 123
Download to read offline
Gaps in the algorithm
What machine learning can teach us about the limits of our knowledge
SAScon 2017
Will Critchlow - @willcritchlow
The rise of ML has taken an
already-complex system and made
it incomprehensible
We might believe we know
what works.
But experiments show that’s not
really true
Computers might already be better
than us.
By exploring their limits, we learn
more about our own, and about the
underlying algorithm
This is the sequel to a talk I’ve given
a couple of times in the US
...and once in Leeds... if you didn’t see those, you can catch up here:
See the full video of my San Diego talk in DistilledU
If you did see one of them, have a
nap for a few minutes
Or check your email
Information
retrieval
PageRank
Original
research
TWEAKS
The “classical” algorithm is full of tweaks
Particularly this comment from a user called Kevin Lacker (@lacker):
When Amit left, this thread was fascinating
High-
dimension
Non-linear
Discontinuous
The algorithm became far too complex to
approximate in your head:
Authority
Relevance
It’s not even easy in two
dimensions:
Authority
Relevance
It’s not even easy in two
dimensions:
Imagine choosing
between a
more-relevant page
with less authority…
Authority
Relevance
It’s not even easy in two
dimensions:
Imagine choosing
between a
more-relevant page with
less authority…
...and a less-relevant
page with more
authority.
It’s only getting worse under Sundar Pichai
Aided by the new head of search John
Giannandrea and ML experts like Jeff Dean
If you haven’t already seen it, you should
read the story of how Jeff Dean & three
engineers took just a month to beat a
decade’s worth of work by hundreds of
engineers by attacking Translate with ML.
Audiences generally still think
they’re pretty good at this
You’re probably thinking something similar to yourself right now.
I’ve now run an
in-person experiment a
few times.
I show two pages that
rank for a particular
search along with
various metrics for each
page.
Then I ask the audience
to stand up and predict
which page ranks better
for a given query.
I get people to sit down as they get
them wrong.
By the time we’ve done 2 or 3
almost everyone is sitting.
Wake up
Behind this chart is a lot of story...
It starts with a train.
This is the Thameslink. I commute into London on it.
It’s also where I allow myself to write code.
It all started because I wanted to learn ML
keras.io
I quickly found working in Keras was easier
In order to work on a problem area I knew well,
I decided to build a system to predict rankings:
The question we really want to answer is:
“How good is this page for this query?”
We want to train our
model on Google data
But we don’t actually
know how close
together these different
results are.
And we certainly don’t
know if position #3 is
the same relevance to
this query as #3 is to a
totally different query.
So I decided to train on
the problem “does page
A outrank page B for
query X”?
I.e. is it A then B or B
then A?
A
B
A
B
We have tons more data
to train this model on -
every pair of URLs for
every query we look at.
A
B
A
B
And it’s ultimately
equivalent to “how do
we improve page A?”
A
B
A
B
In mathematical terms, we express each page as a set
of features:
{‘DA’: ‘67’, ‘lrd’: ‘254’, ‘tld’: ‘1’, ‘h1_tgtg’: ‘0.478’, ‘links_on_page’: ‘200’ ....}
Combine the two sets of features into one big vector.
Label it as (1,0) if A outranks B and (0,1) if B outranks A.
A
B
Note: we’re doing no spam detection
We’re working only with Google’s top 10
To run the model, we input a
pair of pages with their
associated metrics.
New
input
Model
New
input
We get back a probability of
page A outranking page B.
Model
Probability-
weighted
predictions
New
input
Why? What are we doing here?
If we could do this perfectly, then
we could tweak the values of our
page (call that A`) and compare
A to A`
We’d get to simulate changes to
see impacts without making them
This is the holy grail
And when we get close the gaps
will tell us where the unknowns in
the algorithm lie
There’s a lot of dead-ends before
we get anywhere near that though
Let’s go stumbling through the trees
The first thing to realise is that data
pipelines are hard.
Really hard.
There’s a reason that most of Google’s rules of ML is about data.
Here’s what we did:
Raw rankings
data
Raw rankings
data
Pull in API
data
Raw rankings
data
Pull in API
data
Raw rankings
data
Pull in API
data
Crawl the
page
Raw rankings
data
Pull in API
data
Crawl the
page
Process
on-page data
Google just released a useful tool for exploring and
checking your data
This is what it looks like on our data
(Running on their web version)
So I took this big dataset, restricted
it to property keywords, and gave it
a shot
I have an ongoing argument with @tomanthonySEO about how much
the keyword grouping matters...
OVER 90%
accuracy
Now hold on a second. That sounds implausible.
I was accidentally telling it the
answer.
I had included the rank in the
features.
Remember how I said that data pipelines are hard?
So I fixed that problem and re-ran it
OVER 80%
accuracy
Now hold on a second. That still sounds implausible.
One of the problems with deep
learning is the the models are far
from human understanding
There is not really any concept of “explain how you got this answer”
So I tried a much simpler model on the same data
A “decision tree classifier” from scikit-learn
You read these decision trees like flowcharts
The first # refers to the two URLs in the comparison
The name refers to the feature in question
...and the inequality should be self-explanatory
Then at the “leaf” node, you select the category
that got more of the samples
(the 2nd in this case - which means that B outranks A)
So you might end up taking a path like this:
ALSO OVER 80%
accuracy
This is getting silly.
I eventually figured out what was going on.
There are a small number of domains that rank well for
essentially every property-related search in the UK.
My model was just learning:
domain A > domain B > domain C
The model was essentially just identifying URLs
Zoopla vs.
findaproperty
Rightmove vs.
primelocation
etc
So we started splitting the data
better so that it never saw the
same domains that it was trained
on
Our current state-of-the-art is 65-66% accuracy on
large diverse keyword sets.
Decision trees are nowhere near as good on this data.
We are still only using fairly naive on-page metrics.
Known factors Unknown factors
The better our model gets, the more we can
constrain how much of an impact other things must
be having - advanced on-page ML, usage data etc
Known factors Unknown factors
The better our model gets, the more we can
constrain how much of an impact other things must
be having - advanced on-page ML, usage data etc
We expect to see progress from more advanced on-page analysis - we
have a theory that link signals get you into the consideration set, but
increasingly don’t reorder it:
See Tom Capper’s SearchLove San Diego talk in DistilledU
That was all very complicated.
In practice, we are running
real-world split-tests.
This is a difficult thing to do, so we’ve built a platform to help:
In keeping with the theme of this
presentation, I want to share some
scary results
It turns out that you are probably recommending a ton of changes that
are making no difference, or even making things worse...
1. Adding ALT attributes
2. Adding structured data
3. Setting exact match title tags
4. Writing more emotive meta copy
Established wisdom and correlation studies would suggest ALT
attributes on images might be good for SEO
Result: null test. No measurable change in performance.
1. Adding ALT attributes
2. Adding structured data
3. Setting exact match title tags
4. Writing more emotive meta copy
Surprisingly often, also a null test result
1. Adding ALT attributes
2. Adding structured data
3. Setting exact match title tags
4. Writing more emotive meta copy
Title tag before: Which TV should I buy? - Argos
Title tag after: Which TV to buy? - Argos
What happens when you match title tags to the greatest search volume?
Organic sessions decreased by an average of 8%
1. Adding ALT attributes
2. Adding structured data
3. Setting exact match title tags
4. Writing more emotive meta copy
What happens when you try to write more engaging titles & meta?
What happens when you try to write more engaging titles & meta?
Maybe not quite this engaging
Still nope.
Don’t worry.
We’ve also had some great results.
Some that we have
talked about before
1. Adding structured data
2. Using JS to show content
3. Removing SEO category text
Category pages have lots of images and not much text
Adding structured data to category pages
Organic sessions increased by 11%
1. Adding structured data
2. Using JS to show content
3. Removing SEO category text
We can render Javascript!
What happens if your content is only visible with Javascript?
Javascript EnabledJavascript Disabled
Making it visible increased organic sessions by ~ 6.2%
Read more on our blog: early results from split-testing JS for SEO
1. Adding structured data
2. Using JS to show content
3. Removing SEO category text
How does SEO text on category pages perform?
E-commerce site number 1 ~ 3.1% increase in organic sessions
E-commerce site number 2 - No effect/negative effect
And a bunch that we haven’t written up yet:
Including:
● Replacing en-gb words & spellings with en-us on British company’s US site
○ Status: statistically significant positive uplift
● Fresh content: more recent update dates across large long-tail set of pages
○ Status: statistically significant positive uplift
● Change on-page targeting to higher volume query structure
○ Status: statistically significant positive uplift
All of this is why we have been
investing so much in
split-testing
Check out www.distilledodn.com
if you haven’t already.
We will be happy to demo for
you.
We’re now serving well over a
billion requests / month, and
recently published information
covering everything from
response times to our +£100k /
month split test.
Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
2. Machine learning will make this worse, and is accelerating under Sundar
Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
2. Machine learning will make this worse, and is accelerating under Sundar
3. By applying our own machine learning, we can model the algorithm and find
the gaps in our understanding
Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
2. Machine learning will make this worse, and is accelerating under Sundar
3. By applying our own machine learning, we can model the algorithm and find
the gaps in our understanding
4. We can apply what we learn by split-testing on our own sites:
Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
2. Machine learning will make this worse, and is accelerating under Sundar
3. By applying our own machine learning, we can model the algorithm and find
the gaps in our understanding
4. We can apply what we learn by split-testing on our own sites:
a. It is very likely that if you are not split-testing, you are recommending
changes that have no effect
Let’s recap
1. Even in a world of 200+ “classical” ranking factors, humans were bad at
understanding the algorithm
2. Machine learning will make this worse, and is accelerating under Sundar
3. By applying our own machine learning, we can model the algorithm and find
the gaps in our understanding
4. We can apply what we learn by split-testing on our own sites:
a. It is very likely that if you are not split-testing, you are recommending
changes that have no effect
b. And (obviously worse) you are very likely recommending changes that
damage your visibility
Questions: @willcritchlow
● Sundar Pichai
● Go
● Jeff Dean
● Train
● Wake up
● Statue of Liberty
● Sleeping cat
● Complexity
● Holy Grail
● Wilderness
● Pipeline
● Houses
Image credits
● Head in hands
● Rope bridge
● Spider
● Cheating
● Celebration
● Split rock
● Science
● Jolly Roger
● Thumbs up
● Spam

More Related Content

What's hot

SEO by Hypothesis
SEO by HypothesisSEO by Hypothesis
SEO by HypothesisTom Anthony
 
SEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach Us
SEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach UsSEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach Us
SEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach UsTom Anthony
 
SearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs
SearchLove London 2016 | Dom Woodman | How to Get Insight From Your LogsSearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs
SearchLove London 2016 | Dom Woodman | How to Get Insight From Your LogsDistilled
 
SearchLove London 2016 | Amy Harrison | Stand out to YOUR Crowd: A Simple Fra...
SearchLove London 2016 | Amy Harrison | Stand out to YOUR Crowd: A Simple Fra...SearchLove London 2016 | Amy Harrison | Stand out to YOUR Crowd: A Simple Fra...
SearchLove London 2016 | Amy Harrison | Stand out to YOUR Crowd: A Simple Fra...Distilled
 
SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...
SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...
SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...Distilled
 
How to spot a bad link
How to spot a bad linkHow to spot a bad link
How to spot a bad linkLinkRisk
 
SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...
SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...
SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...Distilled
 
SearchLove Boston 2017 | Paul Madden | You, Google and Links: It's Complicated
SearchLove Boston 2017 | Paul Madden | You, Google and Links: It's ComplicatedSearchLove Boston 2017 | Paul Madden | You, Google and Links: It's Complicated
SearchLove Boston 2017 | Paul Madden | You, Google and Links: It's ComplicatedDistilled
 
SearchLove San Diego 2017 | Michael King | Machine Doing
SearchLove San Diego 2017 | Michael King | Machine DoingSearchLove San Diego 2017 | Michael King | Machine Doing
SearchLove San Diego 2017 | Michael King | Machine DoingDistilled
 
SearchLove San Diego 2017 | Emily Grossman | The New Mobile
SearchLove San Diego 2017 | Emily Grossman | The New MobileSearchLove San Diego 2017 | Emily Grossman | The New Mobile
SearchLove San Diego 2017 | Emily Grossman | The New MobileDistilled
 
19 Lessons I learned from a year of SEO split testing
19 Lessons I learned from a year of SEO split testing19 Lessons I learned from a year of SEO split testing
19 Lessons I learned from a year of SEO split testingDominic Woodman
 
Next Era of SEO: A Guide to SEO Split-Testing
Next Era of SEO: A Guide to SEO Split-TestingNext Era of SEO: A Guide to SEO Split-Testing
Next Era of SEO: A Guide to SEO Split-TestingTom Anthony
 
3 New Techniques for the Modern Age of SEO
3 New Techniques for the Modern Age of SEO3 New Techniques for the Modern Age of SEO
3 New Techniques for the Modern Age of SEOTom Anthony
 
SearchLove London 2016 | Bridget Randolph | The Changing Landscape of Mobile ...
SearchLove London 2016 | Bridget Randolph | The Changing Landscape of Mobile ...SearchLove London 2016 | Bridget Randolph | The Changing Landscape of Mobile ...
SearchLove London 2016 | Bridget Randolph | The Changing Landscape of Mobile ...Distilled
 
Entity Disambiguation - the Semantic XRay
Entity Disambiguation - the Semantic XRayEntity Disambiguation - the Semantic XRay
Entity Disambiguation - the Semantic XRayJason Darrell
 
Judith lewis - LAC 2017 - Walking hand in hand: Combining PR and SEO
Judith lewis - LAC 2017 - Walking hand in hand: Combining PR and SEOJudith lewis - LAC 2017 - Walking hand in hand: Combining PR and SEO
Judith lewis - LAC 2017 - Walking hand in hand: Combining PR and SEOiGB Affiliate
 
SearchLove San Diego 2017 | Tom Capper | Does Google Still Need Links?
SearchLove San Diego 2017 | Tom Capper | Does Google Still Need Links?SearchLove San Diego 2017 | Tom Capper | Does Google Still Need Links?
SearchLove San Diego 2017 | Tom Capper | Does Google Still Need Links?Distilled
 
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments' SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments' Distilled
 
SearchLove Boston 2016 | Larry Kim | Hacking RankBrain: Four Strategies You’l...
SearchLove Boston 2016 | Larry Kim | Hacking RankBrain: Four Strategies You’l...SearchLove Boston 2016 | Larry Kim | Hacking RankBrain: Four Strategies You’l...
SearchLove Boston 2016 | Larry Kim | Hacking RankBrain: Four Strategies You’l...Distilled
 
SEO Success Factors - SMX Advanced 2014
SEO Success Factors - SMX Advanced 2014SEO Success Factors - SMX Advanced 2014
SEO Success Factors - SMX Advanced 2014Matthew Brown
 

What's hot (20)

SEO by Hypothesis
SEO by HypothesisSEO by Hypothesis
SEO by Hypothesis
 
SEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach Us
SEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach UsSEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach Us
SEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach Us
 
SearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs
SearchLove London 2016 | Dom Woodman | How to Get Insight From Your LogsSearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs
SearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs
 
SearchLove London 2016 | Amy Harrison | Stand out to YOUR Crowd: A Simple Fra...
SearchLove London 2016 | Amy Harrison | Stand out to YOUR Crowd: A Simple Fra...SearchLove London 2016 | Amy Harrison | Stand out to YOUR Crowd: A Simple Fra...
SearchLove London 2016 | Amy Harrison | Stand out to YOUR Crowd: A Simple Fra...
 
SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...
SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...
SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...
 
How to spot a bad link
How to spot a bad linkHow to spot a bad link
How to spot a bad link
 
SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...
SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...
SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...
 
SearchLove Boston 2017 | Paul Madden | You, Google and Links: It's Complicated
SearchLove Boston 2017 | Paul Madden | You, Google and Links: It's ComplicatedSearchLove Boston 2017 | Paul Madden | You, Google and Links: It's Complicated
SearchLove Boston 2017 | Paul Madden | You, Google and Links: It's Complicated
 
SearchLove San Diego 2017 | Michael King | Machine Doing
SearchLove San Diego 2017 | Michael King | Machine DoingSearchLove San Diego 2017 | Michael King | Machine Doing
SearchLove San Diego 2017 | Michael King | Machine Doing
 
SearchLove San Diego 2017 | Emily Grossman | The New Mobile
SearchLove San Diego 2017 | Emily Grossman | The New MobileSearchLove San Diego 2017 | Emily Grossman | The New Mobile
SearchLove San Diego 2017 | Emily Grossman | The New Mobile
 
19 Lessons I learned from a year of SEO split testing
19 Lessons I learned from a year of SEO split testing19 Lessons I learned from a year of SEO split testing
19 Lessons I learned from a year of SEO split testing
 
Next Era of SEO: A Guide to SEO Split-Testing
Next Era of SEO: A Guide to SEO Split-TestingNext Era of SEO: A Guide to SEO Split-Testing
Next Era of SEO: A Guide to SEO Split-Testing
 
3 New Techniques for the Modern Age of SEO
3 New Techniques for the Modern Age of SEO3 New Techniques for the Modern Age of SEO
3 New Techniques for the Modern Age of SEO
 
SearchLove London 2016 | Bridget Randolph | The Changing Landscape of Mobile ...
SearchLove London 2016 | Bridget Randolph | The Changing Landscape of Mobile ...SearchLove London 2016 | Bridget Randolph | The Changing Landscape of Mobile ...
SearchLove London 2016 | Bridget Randolph | The Changing Landscape of Mobile ...
 
Entity Disambiguation - the Semantic XRay
Entity Disambiguation - the Semantic XRayEntity Disambiguation - the Semantic XRay
Entity Disambiguation - the Semantic XRay
 
Judith lewis - LAC 2017 - Walking hand in hand: Combining PR and SEO
Judith lewis - LAC 2017 - Walking hand in hand: Combining PR and SEOJudith lewis - LAC 2017 - Walking hand in hand: Combining PR and SEO
Judith lewis - LAC 2017 - Walking hand in hand: Combining PR and SEO
 
SearchLove San Diego 2017 | Tom Capper | Does Google Still Need Links?
SearchLove San Diego 2017 | Tom Capper | Does Google Still Need Links?SearchLove San Diego 2017 | Tom Capper | Does Google Still Need Links?
SearchLove San Diego 2017 | Tom Capper | Does Google Still Need Links?
 
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments' SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
 
SearchLove Boston 2016 | Larry Kim | Hacking RankBrain: Four Strategies You’l...
SearchLove Boston 2016 | Larry Kim | Hacking RankBrain: Four Strategies You’l...SearchLove Boston 2016 | Larry Kim | Hacking RankBrain: Four Strategies You’l...
SearchLove Boston 2016 | Larry Kim | Hacking RankBrain: Four Strategies You’l...
 
SEO Success Factors - SMX Advanced 2014
SEO Success Factors - SMX Advanced 2014SEO Success Factors - SMX Advanced 2014
SEO Success Factors - SMX Advanced 2014
 

Similar to Gaps in the algorithm

Uncovering 'not provided' keyword data
Uncovering 'not provided' keyword data Uncovering 'not provided' keyword data
Uncovering 'not provided' keyword data Clayton Wood
 
Lessons from SEO split-testing
Lessons from SEO split-testingLessons from SEO split-testing
Lessons from SEO split-testingWill Critchlow
 
Amazon Search Summit - the need for split testing in SEO
Amazon Search Summit - the need for split testing in SEOAmazon Search Summit - the need for split testing in SEO
Amazon Search Summit - the need for split testing in SEOWill Critchlow
 
Knowing Ranking Factors won't be enough!
Knowing Ranking Factors won't be enough!Knowing Ranking Factors won't be enough!
Knowing Ranking Factors won't be enough!Mark Orr
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2Sara Hooker
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratorySara Hooker
 
Data Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerData Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerProduct School
 
Analytics for SEO
Analytics for SEOAnalytics for SEO
Analytics for SEOIan Lurie
 
BrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
BrightonSEO: How to generate 8 million SEO test ideas - Will CritchlowBrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
BrightonSEO: How to generate 8 million SEO test ideas - Will CritchlowWill Critchlow
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
SearchLove London 2018 - Dom Woodman - A year of SEO split testing changed ho...
SearchLove London 2018 - Dom Woodman - A year of SEO split testing changed ho...SearchLove London 2018 - Dom Woodman - A year of SEO split testing changed ho...
SearchLove London 2018 - Dom Woodman - A year of SEO split testing changed ho...Distilled
 
AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty Adnan Rashid
 
Think like a developer debugging seo - be wizard 2013 rimini
Think like a developer  debugging seo - be wizard 2013 riminiThink like a developer  debugging seo - be wizard 2013 rimini
Think like a developer debugging seo - be wizard 2013 riminiDavid Sottimano
 
How To Solve Problems You've Not Seen In SEO Before | Spring Brighton SEO 202...
How To Solve Problems You've Not Seen In SEO Before | Spring Brighton SEO 202...How To Solve Problems You've Not Seen In SEO Before | Spring Brighton SEO 202...
How To Solve Problems You've Not Seen In SEO Before | Spring Brighton SEO 202...Mat Davis
 
Neo4j Presentation
Neo4j PresentationNeo4j Presentation
Neo4j PresentationMax De Marzi
 
Thinking about graphs
Thinking about graphsThinking about graphs
Thinking about graphsNeo4j
 
SearchLove San Diego - Dom Woodman - A Year of SEO Split Testing Changed How ...
SearchLove San Diego - Dom Woodman - A Year of SEO Split Testing Changed How ...SearchLove San Diego - Dom Woodman - A Year of SEO Split Testing Changed How ...
SearchLove San Diego - Dom Woodman - A Year of SEO Split Testing Changed How ...Distilled
 
Split Testing for SEO - 9 Months of Learning
Split Testing for SEO - 9 Months of LearningSplit Testing for SEO - 9 Months of Learning
Split Testing for SEO - 9 Months of LearningDominic Woodman
 

Similar to Gaps in the algorithm (20)

Uncovering 'not provided' keyword data
Uncovering 'not provided' keyword data Uncovering 'not provided' keyword data
Uncovering 'not provided' keyword data
 
Lessons from SEO split-testing
Lessons from SEO split-testingLessons from SEO split-testing
Lessons from SEO split-testing
 
Amazon Search Summit - the need for split testing in SEO
Amazon Search Summit - the need for split testing in SEOAmazon Search Summit - the need for split testing in SEO
Amazon Search Summit - the need for split testing in SEO
 
Knowing Ranking Factors won't be enough!
Knowing Ranking Factors won't be enough!Knowing Ranking Factors won't be enough!
Knowing Ranking Factors won't be enough!
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
Scalding at Etsy
Scalding at EtsyScalding at Etsy
Scalding at Etsy
 
Python Homework Help
Python Homework HelpPython Homework Help
Python Homework Help
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratory
 
Data Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerData Science: The Product Manager's Primer
Data Science: The Product Manager's Primer
 
Analytics for SEO
Analytics for SEOAnalytics for SEO
Analytics for SEO
 
BrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
BrightonSEO: How to generate 8 million SEO test ideas - Will CritchlowBrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
BrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
SearchLove London 2018 - Dom Woodman - A year of SEO split testing changed ho...
SearchLove London 2018 - Dom Woodman - A year of SEO split testing changed ho...SearchLove London 2018 - Dom Woodman - A year of SEO split testing changed ho...
SearchLove London 2018 - Dom Woodman - A year of SEO split testing changed ho...
 
AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty
 
Think like a developer debugging seo - be wizard 2013 rimini
Think like a developer  debugging seo - be wizard 2013 riminiThink like a developer  debugging seo - be wizard 2013 rimini
Think like a developer debugging seo - be wizard 2013 rimini
 
How To Solve Problems You've Not Seen In SEO Before | Spring Brighton SEO 202...
How To Solve Problems You've Not Seen In SEO Before | Spring Brighton SEO 202...How To Solve Problems You've Not Seen In SEO Before | Spring Brighton SEO 202...
How To Solve Problems You've Not Seen In SEO Before | Spring Brighton SEO 202...
 
Neo4j Presentation
Neo4j PresentationNeo4j Presentation
Neo4j Presentation
 
Thinking about graphs
Thinking about graphsThinking about graphs
Thinking about graphs
 
SearchLove San Diego - Dom Woodman - A Year of SEO Split Testing Changed How ...
SearchLove San Diego - Dom Woodman - A Year of SEO Split Testing Changed How ...SearchLove San Diego - Dom Woodman - A Year of SEO Split Testing Changed How ...
SearchLove San Diego - Dom Woodman - A Year of SEO Split Testing Changed How ...
 
Split Testing for SEO - 9 Months of Learning
Split Testing for SEO - 9 Months of LearningSplit Testing for SEO - 9 Months of Learning
Split Testing for SEO - 9 Months of Learning
 

More from Will Critchlow

SEO in Turbulent Times - BrightonSEO San Diego 2023 - Will Critchlow, SearchP...
SEO in Turbulent Times - BrightonSEO San Diego 2023 - Will Critchlow, SearchP...SEO in Turbulent Times - BrightonSEO San Diego 2023 - Will Critchlow, SearchP...
SEO in Turbulent Times - BrightonSEO San Diego 2023 - Will Critchlow, SearchP...Will Critchlow
 
What we can learn from losing SEO tests
What we can learn from losing SEO testsWhat we can learn from losing SEO tests
What we can learn from losing SEO testsWill Critchlow
 
BrightonSEO - SearchPilot - Will Critchlow - When what’s good for users isn’t...
BrightonSEO - SearchPilot - Will Critchlow - When what’s good for users isn’t...BrightonSEO - SearchPilot - Will Critchlow - When what’s good for users isn’t...
BrightonSEO - SearchPilot - Will Critchlow - When what’s good for users isn’t...Will Critchlow
 
BrightonSEO - SearchPilot - Will Critchlow - Lessons from 100s of SEO tests
BrightonSEO - SearchPilot - Will Critchlow - Lessons from 100s of SEO testsBrightonSEO - SearchPilot - Will Critchlow - Lessons from 100s of SEO tests
BrightonSEO - SearchPilot - Will Critchlow - Lessons from 100s of SEO testsWill Critchlow
 
Will critchlow, meaningful metrics
Will critchlow, meaningful metricsWill critchlow, meaningful metrics
Will critchlow, meaningful metricsWill Critchlow
 
Internet trends for marketers
Internet trends for marketersInternet trends for marketers
Internet trends for marketersWill Critchlow
 
Inbound on a shoestring - Searchlove Boston
Inbound on a shoestring - Searchlove BostonInbound on a shoestring - Searchlove Boston
Inbound on a shoestring - Searchlove BostonWill Critchlow
 
Link building mediocre to great
Link building mediocre to greatLink building mediocre to great
Link building mediocre to greatWill Critchlow
 
Common Technical Mistakes SMX Munich 2012
Common Technical Mistakes SMX Munich 2012Common Technical Mistakes SMX Munich 2012
Common Technical Mistakes SMX Munich 2012Will Critchlow
 
Things I wish I'd known
Things I wish I'd knownThings I wish I'd known
Things I wish I'd knownWill Critchlow
 
Why you should love seo
Why you should love seoWhy you should love seo
Why you should love seoWill Critchlow
 

More from Will Critchlow (13)

SEO in Turbulent Times - BrightonSEO San Diego 2023 - Will Critchlow, SearchP...
SEO in Turbulent Times - BrightonSEO San Diego 2023 - Will Critchlow, SearchP...SEO in Turbulent Times - BrightonSEO San Diego 2023 - Will Critchlow, SearchP...
SEO in Turbulent Times - BrightonSEO San Diego 2023 - Will Critchlow, SearchP...
 
What we can learn from losing SEO tests
What we can learn from losing SEO testsWhat we can learn from losing SEO tests
What we can learn from losing SEO tests
 
BrightonSEO - SearchPilot - Will Critchlow - When what’s good for users isn’t...
BrightonSEO - SearchPilot - Will Critchlow - When what’s good for users isn’t...BrightonSEO - SearchPilot - Will Critchlow - When what’s good for users isn’t...
BrightonSEO - SearchPilot - Will Critchlow - When what’s good for users isn’t...
 
BrightonSEO - SearchPilot - Will Critchlow - Lessons from 100s of SEO tests
BrightonSEO - SearchPilot - Will Critchlow - Lessons from 100s of SEO testsBrightonSEO - SearchPilot - Will Critchlow - Lessons from 100s of SEO tests
BrightonSEO - SearchPilot - Will Critchlow - Lessons from 100s of SEO tests
 
Will critchlow, meaningful metrics
Will critchlow, meaningful metricsWill critchlow, meaningful metrics
Will critchlow, meaningful metrics
 
Internet trends for marketers
Internet trends for marketersInternet trends for marketers
Internet trends for marketers
 
Inbound on a shoestring - Searchlove Boston
Inbound on a shoestring - Searchlove BostonInbound on a shoestring - Searchlove Boston
Inbound on a shoestring - Searchlove Boston
 
Black hat politics
Black hat politicsBlack hat politics
Black hat politics
 
Link building mediocre to great
Link building mediocre to greatLink building mediocre to great
Link building mediocre to great
 
Common Technical Mistakes SMX Munich 2012
Common Technical Mistakes SMX Munich 2012Common Technical Mistakes SMX Munich 2012
Common Technical Mistakes SMX Munich 2012
 
Metrics that Matter
Metrics that MatterMetrics that Matter
Metrics that Matter
 
Things I wish I'd known
Things I wish I'd knownThings I wish I'd known
Things I wish I'd known
 
Why you should love seo
Why you should love seoWhy you should love seo
Why you should love seo
 

Recently uploaded

A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Intellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxIntellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxBipin Adhikari
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 

Recently uploaded (20)

A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Intellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxIntellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptx
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 

Gaps in the algorithm

  • 1. Gaps in the algorithm What machine learning can teach us about the limits of our knowledge SAScon 2017 Will Critchlow - @willcritchlow
  • 2. The rise of ML has taken an already-complex system and made it incomprehensible
  • 3. We might believe we know what works. But experiments show that’s not really true
  • 4. Computers might already be better than us. By exploring their limits, we learn more about our own, and about the underlying algorithm
  • 5. This is the sequel to a talk I’ve given a couple of times in the US ...and once in Leeds... if you didn’t see those, you can catch up here:
  • 6. See the full video of my San Diego talk in DistilledU
  • 7. If you did see one of them, have a nap for a few minutes Or check your email
  • 9. Particularly this comment from a user called Kevin Lacker (@lacker): When Amit left, this thread was fascinating
  • 10. High- dimension Non-linear Discontinuous The algorithm became far too complex to approximate in your head:
  • 11. Authority Relevance It’s not even easy in two dimensions:
  • 12. Authority Relevance It’s not even easy in two dimensions: Imagine choosing between a more-relevant page with less authority…
  • 13. Authority Relevance It’s not even easy in two dimensions: Imagine choosing between a more-relevant page with less authority… ...and a less-relevant page with more authority.
  • 14. It’s only getting worse under Sundar Pichai
  • 15. Aided by the new head of search John Giannandrea and ML experts like Jeff Dean
  • 16. If you haven’t already seen it, you should read the story of how Jeff Dean & three engineers took just a month to beat a decade’s worth of work by hundreds of engineers by attacking Translate with ML.
  • 17. Audiences generally still think they’re pretty good at this You’re probably thinking something similar to yourself right now.
  • 18. I’ve now run an in-person experiment a few times.
  • 19. I show two pages that rank for a particular search along with various metrics for each page.
  • 20. Then I ask the audience to stand up and predict which page ranks better for a given query.
  • 21. I get people to sit down as they get them wrong. By the time we’ve done 2 or 3 almost everyone is sitting.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 30.
  • 31. Behind this chart is a lot of story...
  • 32. It starts with a train.
  • 33. This is the Thameslink. I commute into London on it. It’s also where I allow myself to write code.
  • 34. It all started because I wanted to learn ML
  • 35. keras.io I quickly found working in Keras was easier
  • 36. In order to work on a problem area I knew well, I decided to build a system to predict rankings:
  • 37. The question we really want to answer is: “How good is this page for this query?”
  • 38. We want to train our model on Google data
  • 39. But we don’t actually know how close together these different results are.
  • 40. And we certainly don’t know if position #3 is the same relevance to this query as #3 is to a totally different query.
  • 41. So I decided to train on the problem “does page A outrank page B for query X”? I.e. is it A then B or B then A? A B A B
  • 42. We have tons more data to train this model on - every pair of URLs for every query we look at. A B A B
  • 43. And it’s ultimately equivalent to “how do we improve page A?” A B A B
  • 44. In mathematical terms, we express each page as a set of features: {‘DA’: ‘67’, ‘lrd’: ‘254’, ‘tld’: ‘1’, ‘h1_tgtg’: ‘0.478’, ‘links_on_page’: ‘200’ ....} Combine the two sets of features into one big vector. Label it as (1,0) if A outranks B and (0,1) if B outranks A. A B
  • 45. Note: we’re doing no spam detection We’re working only with Google’s top 10
  • 46. To run the model, we input a pair of pages with their associated metrics. New input
  • 48. We get back a probability of page A outranking page B. Model Probability- weighted predictions New input
  • 49. Why? What are we doing here?
  • 50. If we could do this perfectly, then we could tweak the values of our page (call that A`) and compare A to A` We’d get to simulate changes to see impacts without making them This is the holy grail
  • 51. And when we get close the gaps will tell us where the unknowns in the algorithm lie
  • 52. There’s a lot of dead-ends before we get anywhere near that though Let’s go stumbling through the trees
  • 53. The first thing to realise is that data pipelines are hard. Really hard. There’s a reason that most of Google’s rules of ML is about data. Here’s what we did:
  • 57. Raw rankings data Pull in API data Crawl the page
  • 58. Raw rankings data Pull in API data Crawl the page Process on-page data
  • 59. Google just released a useful tool for exploring and checking your data
  • 60. This is what it looks like on our data (Running on their web version)
  • 61. So I took this big dataset, restricted it to property keywords, and gave it a shot I have an ongoing argument with @tomanthonySEO about how much the keyword grouping matters...
  • 62. OVER 90% accuracy Now hold on a second. That sounds implausible.
  • 63. I was accidentally telling it the answer. I had included the rank in the features. Remember how I said that data pipelines are hard?
  • 64. So I fixed that problem and re-ran it
  • 65. OVER 80% accuracy Now hold on a second. That still sounds implausible.
  • 66. One of the problems with deep learning is the the models are far from human understanding There is not really any concept of “explain how you got this answer”
  • 67. So I tried a much simpler model on the same data A “decision tree classifier” from scikit-learn
  • 68. You read these decision trees like flowcharts The first # refers to the two URLs in the comparison
  • 69. The name refers to the feature in question
  • 70. ...and the inequality should be self-explanatory
  • 71. Then at the “leaf” node, you select the category that got more of the samples (the 2nd in this case - which means that B outranks A)
  • 72. So you might end up taking a path like this:
  • 73. ALSO OVER 80% accuracy This is getting silly.
  • 74. I eventually figured out what was going on. There are a small number of domains that rank well for essentially every property-related search in the UK. My model was just learning: domain A > domain B > domain C
  • 75. The model was essentially just identifying URLs Zoopla vs. findaproperty Rightmove vs. primelocation etc
  • 76. So we started splitting the data better so that it never saw the same domains that it was trained on
  • 77. Our current state-of-the-art is 65-66% accuracy on large diverse keyword sets. Decision trees are nowhere near as good on this data. We are still only using fairly naive on-page metrics.
  • 78.
  • 79. Known factors Unknown factors The better our model gets, the more we can constrain how much of an impact other things must be having - advanced on-page ML, usage data etc
  • 80. Known factors Unknown factors The better our model gets, the more we can constrain how much of an impact other things must be having - advanced on-page ML, usage data etc We expect to see progress from more advanced on-page analysis - we have a theory that link signals get you into the consideration set, but increasingly don’t reorder it:
  • 81. See Tom Capper’s SearchLove San Diego talk in DistilledU
  • 82. That was all very complicated. In practice, we are running real-world split-tests. This is a difficult thing to do, so we’ve built a platform to help:
  • 83.
  • 84. In keeping with the theme of this presentation, I want to share some scary results It turns out that you are probably recommending a ton of changes that are making no difference, or even making things worse...
  • 85. 1. Adding ALT attributes 2. Adding structured data 3. Setting exact match title tags 4. Writing more emotive meta copy
  • 86. Established wisdom and correlation studies would suggest ALT attributes on images might be good for SEO
  • 87. Result: null test. No measurable change in performance.
  • 88. 1. Adding ALT attributes 2. Adding structured data 3. Setting exact match title tags 4. Writing more emotive meta copy
  • 89. Surprisingly often, also a null test result
  • 90. 1. Adding ALT attributes 2. Adding structured data 3. Setting exact match title tags 4. Writing more emotive meta copy
  • 91. Title tag before: Which TV should I buy? - Argos Title tag after: Which TV to buy? - Argos What happens when you match title tags to the greatest search volume?
  • 92. Organic sessions decreased by an average of 8%
  • 93. 1. Adding ALT attributes 2. Adding structured data 3. Setting exact match title tags 4. Writing more emotive meta copy
  • 94. What happens when you try to write more engaging titles & meta?
  • 95. What happens when you try to write more engaging titles & meta? Maybe not quite this engaging
  • 97.
  • 98. Don’t worry. We’ve also had some great results.
  • 99. Some that we have talked about before
  • 100. 1. Adding structured data 2. Using JS to show content 3. Removing SEO category text
  • 101. Category pages have lots of images and not much text
  • 102. Adding structured data to category pages
  • 104. 1. Adding structured data 2. Using JS to show content 3. Removing SEO category text
  • 105. We can render Javascript!
  • 106. What happens if your content is only visible with Javascript? Javascript EnabledJavascript Disabled
  • 107. Making it visible increased organic sessions by ~ 6.2%
  • 108. Read more on our blog: early results from split-testing JS for SEO
  • 109. 1. Adding structured data 2. Using JS to show content 3. Removing SEO category text
  • 110. How does SEO text on category pages perform?
  • 111. E-commerce site number 1 ~ 3.1% increase in organic sessions
  • 112. E-commerce site number 2 - No effect/negative effect
  • 113.
  • 114. And a bunch that we haven’t written up yet: Including: ● Replacing en-gb words & spellings with en-us on British company’s US site ○ Status: statistically significant positive uplift ● Fresh content: more recent update dates across large long-tail set of pages ○ Status: statistically significant positive uplift ● Change on-page targeting to higher volume query structure ○ Status: statistically significant positive uplift
  • 115. All of this is why we have been investing so much in split-testing Check out www.distilledodn.com if you haven’t already. We will be happy to demo for you. We’re now serving well over a billion requests / month, and recently published information covering everything from response times to our +£100k / month split test.
  • 116. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm
  • 117. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar
  • 118. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. By applying our own machine learning, we can model the algorithm and find the gaps in our understanding
  • 119. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. By applying our own machine learning, we can model the algorithm and find the gaps in our understanding 4. We can apply what we learn by split-testing on our own sites:
  • 120. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. By applying our own machine learning, we can model the algorithm and find the gaps in our understanding 4. We can apply what we learn by split-testing on our own sites: a. It is very likely that if you are not split-testing, you are recommending changes that have no effect
  • 121. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. By applying our own machine learning, we can model the algorithm and find the gaps in our understanding 4. We can apply what we learn by split-testing on our own sites: a. It is very likely that if you are not split-testing, you are recommending changes that have no effect b. And (obviously worse) you are very likely recommending changes that damage your visibility
  • 123. ● Sundar Pichai ● Go ● Jeff Dean ● Train ● Wake up ● Statue of Liberty ● Sleeping cat ● Complexity ● Holy Grail ● Wilderness ● Pipeline ● Houses Image credits ● Head in hands ● Rope bridge ● Spider ● Cheating ● Celebration ● Split rock ● Science ● Jolly Roger ● Thumbs up ● Spam