In his talk at 5 Hours of Technical SEO, organized by SEMRush, Bartosz Góralewicz spoke to Nik Ranger, Cindy Krum, and Will Critchlow about the most common issues preventing large websites from getting indexed by Google.
2. Helping Fortune 500's rank
better and get more traffic
Bartosz Góralewicz
@bart_goralewicz
www.onely.com
We're deeply specialized:
Technical SEO
JavaScript SEO
Rendering SEO (!)
Indexing Issues (!)
Web Performance
Link to this deck -> on the
last slide
@bart_goralewiczwww.onely.com
5. … truth be told…
it gets boring sometimes…
@bart_goralewiczwww.onely.com
More
proxies
Proxies
Scraping
sitemaps
Google banning
our proxies
Websites banning
our proxies because
we scrape sitemaps
More bans, more
captchas
6. ... Locked down in the house, my mind
started to play tricks on me…
@bart_goralewiczwww.onely.com
?
?
?
?? ?
7. I felt like Google started to challenge me
?
?
?
@bart_goralewiczwww.onely.com
8. Am I a robot?
@bart_goralewiczwww.onely.com
?
?
?
9. Then I finally understood why
SEOs do that..
@bart_goralewiczwww.onely.com
… I felt a strong urge to do something that
we all hate so much.
Sorry:(
12. I called Tomek, our head of R&D
@bart_goralewiczwww.onely.com
we have 1 million URLs
in our database.
AMAZING - Let’s see what
correlates – I’ll talk about this
at 5 hours of technical SEO.
#AmazingContent
18. Percentage of URLs NOT indexed
30%76%
15%
81%
14%14%
38%71%
98%
@bart_goralewiczwww.onely.com
19.
20. *sorted by the level of complexity, ascending
Every kind of indexing problems comes from different
origins and requires different solutions.
URL indexing
problems
Mobile-first
related indexing
problems
JavaScript related
indexing problems
Layout based
indexing problems
4 kinds of indexing problems*
@bart_goralewiczwww.onely.com
22. #SEJSummit
@bart_goralewicz
Discovery Queue Crawl Rendering
Index selectionIndexingRanking
*please don’t start a Twitter war after this slide
Partial indexing issue = URL not
indexed AFTER it was crawled *
How indexing works
@bart_goralewiczwww.onely.com
27. Index selection for dummies
SOURCE: Patent Method and apparatus for managing a backlog of pending URL crawls (patent US8676783B1)
Limit: 100 people
Rendering
Links
Efficient
crawling
Content
Indexing
strategy
@bart_goralewiczwww.onely.com
29. *sorted by the level of complexity, ascending
Every kind of indexing problems comes from different
origins and requires different solutions.
URL indexing
problems
Mobile-first
related indexing
problems
JavaScript related
indexing problems
Layout based
indexing problems
4 kinds of indexing problems*
@bart_goralewiczwww.onely.com
30. Let’s start easy with a little
@bart_goralewiczwww.onely.com
warm up
31.
32. URL indexing - example
one.ly/alba-shoes
@bart_goralewiczwww.onely.com
33. URL indexing - example
@bart_goralewiczwww.onely.com
one.ly/alba-shoes
34. Problem with the site: command
False negatives
@bart_goralewiczwww.onely.com
35. Site: command
new challenges
Site:URL – watch out
for false negatives*
*fortunately, there are a few ways to
avoid those and get 100% accuracy
@bart_goralewiczwww.onely.com
63. INDEXED
JavaScript indexing problems
= partial indexing
@bart_goralewiczwww.onely.com
The URL is JavaScript dependent content –
NOT INDEXED.
How to spot JavaScript indexing problems?
64. WRS*
To understand JS-related indexing problems, we
need to look under Google’s hood a bit.
@bart_goralewiczwww.onely.com
65. To understand JS-related indexing problems, we
need to look under Google’s hood a bit.
WRS*
@bart_goralewiczwww.onely.com
*Web Rendering Service
66. Google limits CPU consumption
source: Google Webmaster Conference Product Summit, Mountain View, CA
http://services.google.com/fh/files/events/wmconf_product_summit_slides_publish.pdf
@bart_goralewiczwww.onely.com
69. Browser BOR
Browser BORvs
source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1)
@bart_goralewiczwww.onely.com
70. How Batch-
Optimized
Rendering works
step by step
source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1)
@bart_goralewiczwww.onely.com
71. Step 1.
BOR skips all resources which
are not essential to generate
a preview of your page
Examples:
Tracking scripts
(Google Analytics,
Hotjar etc.)
Ads
Images*
How Batch-optimized rendering works
source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1)
@bart_goralewiczwww.onely.com
74. Set the value of a Virtual
Clock
Step 2.
How Batch-optimized rendering works
source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1)
@bart_goralewiczwww.onely.com
75. 1. Virtual Clock’s time runs out*
2. Website’s layout is generated
*simplification
Step 3.
How Batch-optimized rendering works
source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1)
@bart_goralewiczwww.onely.com
76. Using this data to rank better
Virtual
Clock
Layout
source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1)
@bart_goralewiczwww.onely.com
78. Rendering pauses
while waiting for
scripts, CSS files etc.
Cost of our
website’s rendering
A script/CSS heavy
website needs
more “virtual time”
on the virtual clock
Source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1)
Virtual Clock
@bart_goralewiczwww.onely.com
79. BOR – a place
where real time
doesn’t matter.
Source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1)
@bart_goralewiczwww.onely.com
84. Virtual clock’s
time runs out
the LAYOUT is
generated
Source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1)
@bart_goralewiczwww.onely.com
89. A lot of focus on… layout.
Source: BOR patents (2012 -2018)
@bart_goralewiczwww.onely.com
90. text appearing above-the-fold (e.g.,
visible without scrolling) may be
considered more important than text
below-the-line.”
Content location
matters
source: Patent Batch-optimized render and fetch architecture (patent US20180276220A1)
„
@bart_goralewiczwww.onely.com
91. Patent on Scheduling
resource crawls (filed
in 2011)
The importance of the
section is based on (...)
prominence of the section
within the rendered layout.
Source: Patent Scheduling resource crawls (US20130144858A1)
ads ads
„
@bart_goralewiczwww.onely.com
94. (…) link positioned under the
“More Top Stories” heading on the
cnn.com has a high probability
of being selected.
„
Some sections may get more
“Link Juice”* from Google
*Wink, Wink John Mu ;)
source: Google patent Ranking documents based on user behavior and/or feature data (US10152520B1)
@bart_goralewiczwww.onely.com
95. Google seems to struggle with
indexing “related items”, “you may
also be interested in”.
@bart_goralewiczwww.onely.com
105. Patent on Scheduling
resource crawls (filed
in 2011)
The importance of the
section is based on (...)
prominence of the section
within the rendered layout.
Source: Patent Scheduling resource crawls (US20130144858A1)
ads ads
„
@bart_goralewiczwww.onely.com
109. *sorted by the level of complexity, ascending
Every kind of indexing problems comes from different
origins and requires different solutions.
Mobile-first
related indexing
problems
JavaScript related
indexing problems
Layout based
indexing problems
Every kind of indexing problem*
URL indexing
problems
@bart_goralewiczwww.onely.com
115. Patent on Scheduling
resource crawls (filed
in 2011)
The importance of the
section is based on (...)
prominence of the section
within the rendered layout.
Source: Patent Scheduling resource crawls (US20130144858A1)
ads ads
„
@bart_goralewiczwww.onely.com