SlideShare a Scribd company logo
1 of 21
Download to read offline
Dawn	
  Anderson	
  @	
  dawnieando
Indexed	
  Web	
  contains at	
  least	
  4.73	
  billion	
   pages (13/11/2015)
05
TOO MUCH CONTENT
Total	
  number	
  of	
  websites
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
1,000,000,000
750,000,000
500,000,000
250,000,000
SINCE	
  2013	
  THE	
  WEB	
  IS	
  
THOUGHT	
  TO	
  HAVE	
  
INCREASED	
  IN	
  SIZE	
  BY	
  1/3
Capacity	
  limits	
  
on	
  Google’s	
  
crawling	
  system
By	
  prioritising	
  
URLs	
  for	
  
crawling
By	
  assigning	
  
crawl	
  period	
  
intervals	
  to	
  URLs
How	
  have	
  
search	
  engines	
  
responded?
By	
  creating	
  work	
  
‘schedules’	
  for	
  
Googlebots
06
TOO MUCH CONTENT
9	
  types	
  of	
  
Googlebot
THE KEY PERSONAS
02
SUPPORTING	
  ROLES
Indexer	
  /	
  
Ranking	
  Engine
The	
  URL	
  
Scheduler
History	
  Logs
Link	
  Logs
Anchor	
  Logs
LOOKING	
  AT	
  ‘PAST	
  DATA’
‘Ranks	
  nothing	
  at	
  all’
Takes	
  a	
  list	
  of	
  URLs	
  to	
  crawl	
  from	
  URL	
  Scheduler
Job	
  varies	
  based	
  on	
  ‘bot’	
  type
Runs	
  errands	
  &	
  makes	
  deliveries	
  for	
  the	
  URL	
  server,	
  
indexer	
  /	
  ranking	
  engine	
  and	
  logs
Makes	
  notes	
  of	
  outbound	
   linked	
  pages	
  and	
  additional	
  
links	
  for	
  future	
  crawling
Takes	
  notes	
  of	
  ‘hints’	
  from	
  URL	
  scheduler	
  when	
  crawling
Tells	
  tales	
  of	
  URL	
  accessibility	
  status,	
  server	
  response	
  
codes,	
  notes	
  relationships	
  between	
  links	
  and	
  collects	
  
content	
  checksums	
  (binary	
  data	
  equivalent	
  of	
  web	
  
content)	
  for	
  comparison	
  with	
  past	
  visits	
  by	
  history	
  and	
  
link	
  logs
03
GOOGLEBOT’S JOBS
04
ROLES – MAJOR PLAYERS – A ‘BOSS’- URL
SCHEDULER
Think	
  of	
  it	
  as	
  Google’s	
  
line	
  manager	
  or	
  ‘air	
  
traffic	
  controller’	
  for	
  
Googlebots in	
  the	
  
web	
  crawling	
  system
Schedules	
  Googlebot visits	
  to	
  URLs
Decides	
  which	
  URLs	
  to	
  ‘feed’	
  to	
  Googlebot
Uses	
  data	
  from	
  the	
  history	
  logs	
  about	
  past	
  visits
Assigns	
  visit	
  regularity	
  of	
  Googlebot to	
  URLs
Drops	
  ‘hints’	
  to	
  Googlebot to	
  guide	
  on	
  types	
  of	
  content	
  NOT	
  to	
  
crawl	
  and	
  excludes	
  some	
  URLs	
  from	
  schedules
Analyses	
  past	
  ‘change’	
  periods	
  and	
  predicts	
  future	
  ‘change’	
  
periods	
  for	
  URLs	
  for	
  the	
  purposes	
  of	
  scheduling	
  Googlebot visits
Checks	
  ‘page	
  importance’	
  in	
  scheduling	
  visits
Assigns	
  URLs	
  to	
  ‘layers	
  /	
  tiers’	
  for	
  crawling	
  schedules
Scheduler	
  checks	
  URLs	
  
for	
  ‘importance’,	
  ‘boost	
  
factor’	
  candidacy,	
  
‘probability	
  of	
  
modification’
GOOGLEBOT’S BEEN PUT ON A
URL CONTROLLED DIET
09
The	
  URL	
  Scheduler	
  
controls	
  the	
  meal	
  
planner
Carefully	
  controls	
  
the	
  list	
  of	
  URLs	
  
Googlebot vits
‘Budgets’	
  are	
  allocated
£
CRAWL BUDGET – WHAT IS IT?
10
Roughly	
  proportionate	
  to	
  Page	
  Importance	
  (LinkEquity)	
   &	
  speed
Pages	
  with	
  a	
  lot	
  of	
  healthy	
  links	
  get	
  crawled	
  more	
  (Can	
  include	
  internal	
  links??)
Apportioned	
  by	
  the	
  URL	
  scheduler	
  to	
  Googlebots
WHAT	
  IS	
  A	
  CRAWL	
  BUDGET?	
  -­‐ An	
  allocation	
  of	
  ‘crawl	
  visit	
  frequency’	
  apportioned	
  to	
  URLs	
  on	
  a	
  site
But	
  there	
  are	
  other	
  factors	
  affecting	
  frequency	
  of	
  Googlebot visits	
  aside	
  from	
  importance	
  /	
  speed
The	
  vast	
  majority	
  of	
  URLs	
  on	
  the	
  web	
  don’t	
  get	
  a	
  lot	
  of	
  budget	
  allocated	
  to	
  them
Current	
  capacity	
  of	
  the	
  web	
  crawling	
  system	
  is	
  high
Your	
  URL	
  is	
  ‘important’
Your	
  URL	
  changes	
  a	
  lot	
  with	
  critical	
  material	
  content	
  
change
Probability	
  and	
  predictability	
  of	
  critical	
  material	
  content	
  
change	
  is	
  high	
  for	
  your	
  URL
Your	
  website	
  speed	
  is	
  fast	
  and	
  Googlebot gets	
  the	
  time	
  to	
  
visit	
  your	
  URL
Your	
  URL	
  has	
  been	
  ‘upgraded’	
  to	
  a	
  daily	
  or	
  real	
  time	
  crawl	
  
layer
12
POSITIVE FACTORS AFFECTING
GOOGLEBOT VISIT FREQUENCY
Current	
  capacity	
  of	
  web	
  crawling	
  system	
  is	
  low
Your	
  URL	
  has	
  been	
  detected	
  as	
  a	
  ‘spam’	
  URL
Your	
  URL	
  is	
  in	
  an	
  ‘inactive’	
  base	
  layer	
  segment
Your	
  URLs	
  are	
  ‘tripping	
  hints’	
  built	
  into	
  the	
  system	
  to	
  
detect	
  non-­‐critical	
  change	
  dynamic	
  content
Probability	
  and	
  predictability	
  of	
  critical	
  material	
  content	
  
change	
  is	
  low	
  for	
  your	
  URL
Your	
  website	
  speed	
  is	
  slow	
  and	
  Googlebot doesn’t	
  get	
  the	
  
time	
  to	
  visit	
  your	
  URL
Your	
  URL	
  has	
  been	
  ‘downgraded’	
  to	
  an	
  ‘inactive’	
  base	
  
layer	
  segment
Your	
  URL	
  has	
  returned	
  an	
  ‘unreachable’	
  server	
  response	
  
code	
  recently
13
NEGATIVE FACTORS AFFECTING
GOOGLEBOT VISIT FREQUENCY
FIND GOOGLEBOT
16
AUTOMATE	
  SERVER	
  LOG	
  
RETRIEVAL	
  VIA	
  CRON	
  JOB
grep Googlebot access_log
>googlebot_access.txt
LOOK THROUGH ‘SPIDER EYES’ VIA
LOG ANALYSIS – ANALYSE GOOGLEBOT
17
PREPARE TO BE HORRIFIED
Incorrect	
  URL	
  header	
  response	
  codes	
  (e.g.	
  302s)
301	
  redirect	
  chains
Old	
  files	
  or	
  XML	
  sitemaps	
  left	
  on	
  server	
  from	
  years	
  ago
Infinite/	
  endless	
  loops	
  (circular	
  dependency)
On	
  parameter	
  driven	
  sites	
  URLs	
  crawled	
  which	
  produce	
  same	
  output
URLs	
  generated	
  by	
  spammers
Dead	
  image	
  files	
  being	
  visited
Old	
  CSS	
  files	
  still	
  being	
  crawled	
  and	
  loading	
  legacy	
  images	
  e.g.
SEARCH ENGINE VIEW EMULATOR
11
http://www.ovrdrv.com/search_view
Lynx	
  Browser	
  -­‐ 4	
  options	
   to	
  view	
  
through	
   search	
  engine	
  eyes,	
  
human	
  eyes,	
  page	
  source	
  or	
  
page	
  anlysis
21
LOOK THROUGH ‘SPIDER EYES’
• GSC	
  Crawl	
  Stats
• Google	
  Search	
  Console	
  (all	
  tools)
• Deepcrawl
• Screaming	
  Frog
• Server	
  Log	
  Analysis
• SEMRush (auditing	
  tools)
• Webconfs (header	
  responses	
  /	
  
similarity	
  checker)
• Powermapper (birds	
  eye	
  view	
  of	
  site)
• Search	
  Engine	
  View	
  Emulator
18
FIX GOOGLEBOT’S JOURNEY
SPEED UP YOUR
SITE TO ‘FEED’
GOOGLEGOT
MORE
TECHNICAL	
  ‘FIXES’	
  	
  	
  
Speed	
  up	
  your	
  site
Implement	
  compression,	
  minification,	
  caching
‘
Fix	
  incorrect	
  header	
  response	
  codes
Fix	
  nonsensical	
  ‘infinite	
  loops’	
  generated	
  by	
  
database	
  driven	
  parameters	
  or	
  ‘looping’	
  relative	
  
URLs
Use	
  absolute	
  versus	
  relative	
  internal	
  links
Ensure	
  no	
  parts	
  of	
  content	
  is	
  blocked	
  from	
  
crawlers	
  (e.g.	
  in	
  carousels,	
  concertinas	
  and	
  
tabbed	
  content
Ensure	
  no	
  css or	
  javascript files	
  are	
  blocked	
  from	
  
crawlers
Unpick	
  301	
  redirect	
  chains
21
SPEED TOOLS
SPEED• Yslow
• Pingdom
• Google	
  Page	
  Speed	
  Tests
• Minificiation – JS	
  Compress	
  and	
  CSS	
  
Minifier
• Image	
  Compression	
  –
Compressjpeg.com,	
  tinypng.com
21
URL IMPORTANCE TOOLS
URL	
  IMPORTANCE
• GSC	
  Internal	
  links	
  Report	
  (URL	
  
importance)
• Link	
  Research	
  Tools	
  (Strongest	
  sub	
  
pages	
  reports)
• GSC	
  Internal	
  links	
  (add	
  site	
  categories	
  
and	
  sections	
  as	
  additional	
  profiles)
• Powermapper
STOP YOURSELF
‘VOTING’ FOR THE
WRONG INTERNAL
LINKS IN YOUR SITE
22
‘IT CANNOT BE EMPHASISED ENOUGH
HOW IMPORTANT IT IS TO EMPHASISE
IMPORTANCE’
Most Important Page 1
Most	
  Important	
  Page	
  2
Most	
  Important	
  Page	
  3
ONLINE DEMO OF XML GENERATOR 11
https://www.xml-­‐
sitemaps.com/gen
erator-­‐demo/
https://www.xml-­‐
sitemaps.com/generator-­‐demo/
1. Use	
  XML	
  sitemaps
2. Add	
  site	
  sections	
  (e.g.	
  categories)	
  as	
  profiles	
  in	
  Google	
  Search	
  Console	
   for	
  more	
  granularity
3. Keep	
  301	
  redirections	
  to	
  a	
  minimum
4. Use	
  regular	
  expressions	
   on	
  .htaccess files	
  to	
  implement	
  rules	
  and	
  reduce	
  crawl	
  lag
5. Look	
  out	
  for	
  redirect	
  chains
6. Look	
  out	
  for	
  infinite	
  loops	
  (spider	
  traps)
7. Check	
  URL	
  parameters	
  in	
  Google	
  Search	
  Console
8. Check	
  if	
  URLs	
  return	
  the	
  exact	
  same	
  content	
  and	
  choose	
  one	
  as	
  the	
  preferred	
  URL
9. Block	
  or	
  canonicalise duplicate	
  content
10. Use	
  absolute	
  versus	
  relative	
  URLs
11. Improve	
  site	
  speed
12. Use	
  front	
  facing	
  HTML	
  sitemaps	
  for	
  important	
  pages
13. Use	
  noindex on	
  pages	
  which	
  add	
  no	
  value	
  but	
  may	
  be	
  useful	
  for	
  visitors	
  to	
  traverse	
  your	
  site
14. Use	
  ‘if	
  modified’	
  headers	
  to	
  keep	
  Googlebot out	
  of	
  low	
  importance	
  pages
15. Build	
  server	
  log	
  analysis	
  into	
  your	
  regular	
  SEO	
  activities
03
15 THINGS YOU CAN DO
”WHEN	
  GOOGLEBOT	
  PLAYS	
  ‘SUPERMARKET	
  SWEEP’	
  YOU	
  WANT	
  TO	
  FILL	
  THE	
  
SHOPPING	
  TROLLEY	
  WITH	
  LUXURY	
  ITEMS”
Dawn	
  Anderson	
  @	
  dawnieando
REMEMBER

More Related Content

What's hot

BrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering Budget
BrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering BudgetBrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering Budget
BrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering BudgetBotify
 
On-Page SEO Techniques for 2022
On-Page SEO Techniques for 2022On-Page SEO Techniques for 2022
On-Page SEO Techniques for 2022Stephen Fraga
 
Website Pre SEO Analysis Report- Online Marketing: Search Engine Optimization
Website Pre SEO Analysis Report- Online Marketing: Search Engine OptimizationWebsite Pre SEO Analysis Report- Online Marketing: Search Engine Optimization
Website Pre SEO Analysis Report- Online Marketing: Search Engine OptimizationVikesh Sanwalodia
 
Off-Page SEO Tactics
Off-Page SEO TacticsOff-Page SEO Tactics
Off-Page SEO TacticsRebecca Gill
 
Website Analysis Report - Website Designing Proposal
Website Analysis Report - Website Designing ProposalWebsite Analysis Report - Website Designing Proposal
Website Analysis Report - Website Designing ProposalSuraj Singh
 
Why You Should Invest in Technical SEO by Ruth Burr Reedy
Why You Should Invest in Technical SEO by Ruth Burr ReedyWhy You Should Invest in Technical SEO by Ruth Burr Reedy
Why You Should Invest in Technical SEO by Ruth Burr ReedyGlen Dimaandal
 
On-Site SEO Audit Example
On-Site SEO Audit ExampleOn-Site SEO Audit Example
On-Site SEO Audit ExampleJames Allen
 
Sample SEO presentation for clients
Sample SEO presentation for clientsSample SEO presentation for clients
Sample SEO presentation for clientsSiddu Hosageri
 
SEO 101: How to Get Started Winning Google Search Traffic
SEO 101: How to Get Started Winning Google Search TrafficSEO 101: How to Get Started Winning Google Search Traffic
SEO 101: How to Get Started Winning Google Search TrafficBernard Huang
 
The Elements of On-Page SEO
The Elements of On-Page SEOThe Elements of On-Page SEO
The Elements of On-Page SEOSEO Reseller USA
 
Basic E-Commerce Site Audit
Basic E-Commerce Site AuditBasic E-Commerce Site Audit
Basic E-Commerce Site Auditsemrush_webinars
 
SMX East - SEO Tools Panel
SMX East - SEO Tools PanelSMX East - SEO Tools Panel
SMX East - SEO Tools PanelAbby Hamilton
 
Building an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gapsBuilding an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gapsRazvan Gavrilas
 
Www amazon com-report
Www amazon com-reportWww amazon com-report
Www amazon com-reportMahipSingh13
 

What's hot (18)

Seo onpage & offpage, Search Engine Optimization, SEO
Seo onpage & offpage, Search Engine Optimization, SEOSeo onpage & offpage, Search Engine Optimization, SEO
Seo onpage & offpage, Search Engine Optimization, SEO
 
BrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering Budget
BrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering BudgetBrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering Budget
BrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering Budget
 
On-Page SEO Techniques for 2022
On-Page SEO Techniques for 2022On-Page SEO Techniques for 2022
On-Page SEO Techniques for 2022
 
Website Pre SEO Analysis Report- Online Marketing: Search Engine Optimization
Website Pre SEO Analysis Report- Online Marketing: Search Engine OptimizationWebsite Pre SEO Analysis Report- Online Marketing: Search Engine Optimization
Website Pre SEO Analysis Report- Online Marketing: Search Engine Optimization
 
Off-Page SEO Tactics
Off-Page SEO TacticsOff-Page SEO Tactics
Off-Page SEO Tactics
 
Website Analysis Report - Website Designing Proposal
Website Analysis Report - Website Designing ProposalWebsite Analysis Report - Website Designing Proposal
Website Analysis Report - Website Designing Proposal
 
Seo
SeoSeo
Seo
 
Why You Should Invest in Technical SEO by Ruth Burr Reedy
Why You Should Invest in Technical SEO by Ruth Burr ReedyWhy You Should Invest in Technical SEO by Ruth Burr Reedy
Why You Should Invest in Technical SEO by Ruth Burr Reedy
 
On-Site SEO Audit Example
On-Site SEO Audit ExampleOn-Site SEO Audit Example
On-Site SEO Audit Example
 
Sample SEO presentation for clients
Sample SEO presentation for clientsSample SEO presentation for clients
Sample SEO presentation for clients
 
Seo 101 in 2019
Seo 101 in 2019Seo 101 in 2019
Seo 101 in 2019
 
SEO 101: How to Get Started Winning Google Search Traffic
SEO 101: How to Get Started Winning Google Search TrafficSEO 101: How to Get Started Winning Google Search Traffic
SEO 101: How to Get Started Winning Google Search Traffic
 
The Elements of On-Page SEO
The Elements of On-Page SEOThe Elements of On-Page SEO
The Elements of On-Page SEO
 
Basic E-Commerce Site Audit
Basic E-Commerce Site AuditBasic E-Commerce Site Audit
Basic E-Commerce Site Audit
 
SMX East - SEO Tools Panel
SMX East - SEO Tools PanelSMX East - SEO Tools Panel
SMX East - SEO Tools Panel
 
Building an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gapsBuilding an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gaps
 
Www amazon com-report
Www amazon com-reportWww amazon com-report
Www amazon com-report
 
SEO Tutorial
SEO TutorialSEO Tutorial
SEO Tutorial
 

Similar to How Search Engines Manage Too Much Web Content

Sasconbeta 2015 Dawn Anderson - Talk To The Spider
Sasconbeta 2015 Dawn Anderson - Talk To The SpiderSasconbeta 2015 Dawn Anderson - Talk To The Spider
Sasconbeta 2015 Dawn Anderson - Talk To The SpiderDawn Anderson MSc DigM
 
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016Dawn Anderson MSc DigM
 
Crawl optimization - ( How to optimize to increase crawl budget)
Crawl optimization - ( How to optimize to increase crawl budget)Crawl optimization - ( How to optimize to increase crawl budget)
Crawl optimization - ( How to optimize to increase crawl budget)SyedFaraz41
 
Negotiating crawl budget with googlebots
Negotiating crawl budget with googlebotsNegotiating crawl budget with googlebots
Negotiating crawl budget with googlebotsDawn Anderson MSc DigM
 
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUKeeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUJason Mun
 
Why Your Business Needs a Website in 2023
Why Your Business Needs a Website in 2023Why Your Business Needs a Website in 2023
Why Your Business Needs a Website in 2023metaveostech2023
 
Search Engine Optimization Primer
Search Engine Optimization PrimerSearch Engine Optimization Primer
Search Engine Optimization PrimerSimobo
 
How to perform a technical SEO audit and ramp up your content strategy in 10 ...
How to perform a technical SEO audit and ramp up your content strategy in 10 ...How to perform a technical SEO audit and ramp up your content strategy in 10 ...
How to perform a technical SEO audit and ramp up your content strategy in 10 ...Waqar Ahmad
 
10 Important On-Site Technical SEO Factors.pdf
10 Important On-Site Technical SEO Factors.pdf10 Important On-Site Technical SEO Factors.pdf
10 Important On-Site Technical SEO Factors.pdfRaulrox1
 
10 Important On-Site Technical SEO Factors.pdf
10 Important On-Site Technical SEO Factors.pdf10 Important On-Site Technical SEO Factors.pdf
10 Important On-Site Technical SEO Factors.pdfRaulrox1
 
Website analysis report
Website analysis reportWebsite analysis report
Website analysis reportvimlesh88
 
Google Webmaster Tool Guide
Google Webmaster Tool GuideGoogle Webmaster Tool Guide
Google Webmaster Tool Guideitsyousuf
 
How Google WOrks?
How Google WOrks?How Google WOrks?
How Google WOrks?07Deeps
 
Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Nate Plaunt
 
Javascript SEO Devs and SEOs playing nicely
Javascript SEO Devs and SEOs playing nicelyJavascript SEO Devs and SEOs playing nicely
Javascript SEO Devs and SEOs playing nicelyPeter Mead
 
SEO vs Angular
SEO vs AngularSEO vs Angular
SEO vs AngularFrançois
 

Similar to How Search Engines Manage Too Much Web Content (20)

Sasconbeta 2015 Dawn Anderson - Talk To The Spider
Sasconbeta 2015 Dawn Anderson - Talk To The SpiderSasconbeta 2015 Dawn Anderson - Talk To The Spider
Sasconbeta 2015 Dawn Anderson - Talk To The Spider
 
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
 
Crawl optimization - ( How to optimize to increase crawl budget)
Crawl optimization - ( How to optimize to increase crawl budget)Crawl optimization - ( How to optimize to increase crawl budget)
Crawl optimization - ( How to optimize to increase crawl budget)
 
Negotiating crawl budget with googlebots
Negotiating crawl budget with googlebotsNegotiating crawl budget with googlebots
Negotiating crawl budget with googlebots
 
Foxtail Website Audit
Foxtail Website AuditFoxtail Website Audit
Foxtail Website Audit
 
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUKeeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
 
Seo tutorial
Seo tutorialSeo tutorial
Seo tutorial
 
Stsinks.com seo Pitch ppt
Stsinks.com seo Pitch pptStsinks.com seo Pitch ppt
Stsinks.com seo Pitch ppt
 
Why Your Business Needs a Website in 2023
Why Your Business Needs a Website in 2023Why Your Business Needs a Website in 2023
Why Your Business Needs a Website in 2023
 
Search Engine Optimization Primer
Search Engine Optimization PrimerSearch Engine Optimization Primer
Search Engine Optimization Primer
 
How to perform a technical SEO audit and ramp up your content strategy in 10 ...
How to perform a technical SEO audit and ramp up your content strategy in 10 ...How to perform a technical SEO audit and ramp up your content strategy in 10 ...
How to perform a technical SEO audit and ramp up your content strategy in 10 ...
 
10 Important On-Site Technical SEO Factors.pdf
10 Important On-Site Technical SEO Factors.pdf10 Important On-Site Technical SEO Factors.pdf
10 Important On-Site Technical SEO Factors.pdf
 
10 Important On-Site Technical SEO Factors.pdf
10 Important On-Site Technical SEO Factors.pdf10 Important On-Site Technical SEO Factors.pdf
10 Important On-Site Technical SEO Factors.pdf
 
Website analysis report
Website analysis reportWebsite analysis report
Website analysis report
 
TECHNICAL_SEO.pptx
TECHNICAL_SEO.pptxTECHNICAL_SEO.pptx
TECHNICAL_SEO.pptx
 
Google Webmaster Tool Guide
Google Webmaster Tool GuideGoogle Webmaster Tool Guide
Google Webmaster Tool Guide
 
How Google WOrks?
How Google WOrks?How Google WOrks?
How Google WOrks?
 
Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2
 
Javascript SEO Devs and SEOs playing nicely
Javascript SEO Devs and SEOs playing nicelyJavascript SEO Devs and SEOs playing nicely
Javascript SEO Devs and SEOs playing nicely
 
SEO vs Angular
SEO vs AngularSEO vs Angular
SEO vs Angular
 

More from Semrush

Top E-commerce Marketing Channels in 2021
Top E-commerce Marketing Channels in 2021Top E-commerce Marketing Channels in 2021
Top E-commerce Marketing Channels in 2021Semrush
 
A truly untapped marketing channel
A truly untapped marketing channelA truly untapped marketing channel
A truly untapped marketing channelSemrush
 
Jeffrey Burns - Structured Data for Healthcare
Jeffrey Burns - Structured Data for HealthcareJeffrey Burns - Structured Data for Healthcare
Jeffrey Burns - Structured Data for HealthcareSemrush
 
Garrett French and James Wirth - Building Links to Sales Pages
Garrett French and James Wirth - Building Links to Sales PagesGarrett French and James Wirth - Building Links to Sales Pages
Garrett French and James Wirth - Building Links to Sales PagesSemrush
 
Jono Alderson - Structured Data for Content Marketing
Jono Alderson - Structured Data for Content MarketingJono Alderson - Structured Data for Content Marketing
Jono Alderson - Structured Data for Content MarketingSemrush
 
Martha van Berkel - Content Marketing + Structured Data
Martha van Berkel - Content Marketing + Structured DataMartha van Berkel - Content Marketing + Structured Data
Martha van Berkel - Content Marketing + Structured DataSemrush
 
Rory Hope - How to Improve Performance with Audience-First SEO & Content Stra...
Rory Hope - How to Improve Performance with Audience-First SEO & Content Stra...Rory Hope - How to Improve Performance with Audience-First SEO & Content Stra...
Rory Hope - How to Improve Performance with Audience-First SEO & Content Stra...Semrush
 
Ryan Stewart - Agency Growth: How to Scale From 10 to 50 Clients
Ryan Stewart - Agency Growth: How to Scale From 10 to 50 ClientsRyan Stewart - Agency Growth: How to Scale From 10 to 50 Clients
Ryan Stewart - Agency Growth: How to Scale From 10 to 50 ClientsSemrush
 
Greg Sterling — Why Listings Management Still Matters | 5 Hours of Local SEO
Greg Sterling — Why Listings Management Still Matters | 5 Hours of Local SEOGreg Sterling — Why Listings Management Still Matters | 5 Hours of Local SEO
Greg Sterling — Why Listings Management Still Matters | 5 Hours of Local SEOSemrush
 
Greg Gifford — Ricky Bobby's Guide to Winning at Local Link Building in 2021 ...
Greg Gifford — Ricky Bobby's Guide to Winning at Local Link Building in 2021 ...Greg Gifford — Ricky Bobby's Guide to Winning at Local Link Building in 2021 ...
Greg Gifford — Ricky Bobby's Guide to Winning at Local Link Building in 2021 ...Semrush
 
Tim Capper — Local SEO for Service-Area Businesses into 2021 | 5 Hours of Loc...
Tim Capper — Local SEO for Service-Area Businesses into 2021 | 5 Hours of Loc...Tim Capper — Local SEO for Service-Area Businesses into 2021 | 5 Hours of Loc...
Tim Capper — Local SEO for Service-Area Businesses into 2021 | 5 Hours of Loc...Semrush
 
Rasmus Himmelstrup — The True Value Of Local Search: A Case Study Across Five...
Rasmus Himmelstrup — The True Value Of Local Search: A Case Study Across Five...Rasmus Himmelstrup — The True Value Of Local Search: A Case Study Across Five...
Rasmus Himmelstrup — The True Value Of Local Search: A Case Study Across Five...Semrush
 
Jimmy Newson — Create the Perfect Lead Magnet for Attracting Your Ideal Client
Jimmy Newson — Create the Perfect Lead Magnet for Attracting Your Ideal ClientJimmy Newson — Create the Perfect Lead Magnet for Attracting Your Ideal Client
Jimmy Newson — Create the Perfect Lead Magnet for Attracting Your Ideal ClientSemrush
 
Mordy Oberstein — Wix for SEO: Separating Fact from Fiction
Mordy Oberstein — Wix for SEO: Separating Fact from FictionMordy Oberstein — Wix for SEO: Separating Fact from Fiction
Mordy Oberstein — Wix for SEO: Separating Fact from FictionSemrush
 
Daniel Liddle — 5 Practical Ways to Implement NLP in Your SEO Strategy
Daniel Liddle — 5 Practical Ways to Implement NLP in Your SEO StrategyDaniel Liddle — 5 Practical Ways to Implement NLP in Your SEO Strategy
Daniel Liddle — 5 Practical Ways to Implement NLP in Your SEO StrategySemrush
 
Mordy Oberstein — Room for the Average Site? The SERP, Super-Authorities, & Y...
Mordy Oberstein — Room for the Average Site? The SERP, Super-Authorities, & Y...Mordy Oberstein — Room for the Average Site? The SERP, Super-Authorities, & Y...
Mordy Oberstein — Room for the Average Site? The SERP, Super-Authorities, & Y...Semrush
 
AJ Ghergich — How Anyone Can Win Featured Snippets — Backed by Data Science
AJ Ghergich — How Anyone Can Win Featured Snippets —  Backed by Data ScienceAJ Ghergich — How Anyone Can Win Featured Snippets —  Backed by Data Science
AJ Ghergich — How Anyone Can Win Featured Snippets — Backed by Data ScienceSemrush
 
Marketing Channels: The Digital Marketing Trends for 2021
Marketing Channels: The Digital Marketing Trends for 2021Marketing Channels: The Digital Marketing Trends for 2021
Marketing Channels: The Digital Marketing Trends for 2021Semrush
 
Jono Alderson — Structured Data: Schema Changes and Updates
Jono Alderson — Structured Data: Schema Changes and UpdatesJono Alderson — Structured Data: Schema Changes and Updates
Jono Alderson — Structured Data: Schema Changes and UpdatesSemrush
 
Martha van Berkel — Structured Data: Schema Changes and Updates
Martha van Berkel — Structured Data: Schema Changes and UpdatesMartha van Berkel — Structured Data: Schema Changes and Updates
Martha van Berkel — Structured Data: Schema Changes and UpdatesSemrush
 

More from Semrush (20)

Top E-commerce Marketing Channels in 2021
Top E-commerce Marketing Channels in 2021Top E-commerce Marketing Channels in 2021
Top E-commerce Marketing Channels in 2021
 
A truly untapped marketing channel
A truly untapped marketing channelA truly untapped marketing channel
A truly untapped marketing channel
 
Jeffrey Burns - Structured Data for Healthcare
Jeffrey Burns - Structured Data for HealthcareJeffrey Burns - Structured Data for Healthcare
Jeffrey Burns - Structured Data for Healthcare
 
Garrett French and James Wirth - Building Links to Sales Pages
Garrett French and James Wirth - Building Links to Sales PagesGarrett French and James Wirth - Building Links to Sales Pages
Garrett French and James Wirth - Building Links to Sales Pages
 
Jono Alderson - Structured Data for Content Marketing
Jono Alderson - Structured Data for Content MarketingJono Alderson - Structured Data for Content Marketing
Jono Alderson - Structured Data for Content Marketing
 
Martha van Berkel - Content Marketing + Structured Data
Martha van Berkel - Content Marketing + Structured DataMartha van Berkel - Content Marketing + Structured Data
Martha van Berkel - Content Marketing + Structured Data
 
Rory Hope - How to Improve Performance with Audience-First SEO & Content Stra...
Rory Hope - How to Improve Performance with Audience-First SEO & Content Stra...Rory Hope - How to Improve Performance with Audience-First SEO & Content Stra...
Rory Hope - How to Improve Performance with Audience-First SEO & Content Stra...
 
Ryan Stewart - Agency Growth: How to Scale From 10 to 50 Clients
Ryan Stewart - Agency Growth: How to Scale From 10 to 50 ClientsRyan Stewart - Agency Growth: How to Scale From 10 to 50 Clients
Ryan Stewart - Agency Growth: How to Scale From 10 to 50 Clients
 
Greg Sterling — Why Listings Management Still Matters | 5 Hours of Local SEO
Greg Sterling — Why Listings Management Still Matters | 5 Hours of Local SEOGreg Sterling — Why Listings Management Still Matters | 5 Hours of Local SEO
Greg Sterling — Why Listings Management Still Matters | 5 Hours of Local SEO
 
Greg Gifford — Ricky Bobby's Guide to Winning at Local Link Building in 2021 ...
Greg Gifford — Ricky Bobby's Guide to Winning at Local Link Building in 2021 ...Greg Gifford — Ricky Bobby's Guide to Winning at Local Link Building in 2021 ...
Greg Gifford — Ricky Bobby's Guide to Winning at Local Link Building in 2021 ...
 
Tim Capper — Local SEO for Service-Area Businesses into 2021 | 5 Hours of Loc...
Tim Capper — Local SEO for Service-Area Businesses into 2021 | 5 Hours of Loc...Tim Capper — Local SEO for Service-Area Businesses into 2021 | 5 Hours of Loc...
Tim Capper — Local SEO for Service-Area Businesses into 2021 | 5 Hours of Loc...
 
Rasmus Himmelstrup — The True Value Of Local Search: A Case Study Across Five...
Rasmus Himmelstrup — The True Value Of Local Search: A Case Study Across Five...Rasmus Himmelstrup — The True Value Of Local Search: A Case Study Across Five...
Rasmus Himmelstrup — The True Value Of Local Search: A Case Study Across Five...
 
Jimmy Newson — Create the Perfect Lead Magnet for Attracting Your Ideal Client
Jimmy Newson — Create the Perfect Lead Magnet for Attracting Your Ideal ClientJimmy Newson — Create the Perfect Lead Magnet for Attracting Your Ideal Client
Jimmy Newson — Create the Perfect Lead Magnet for Attracting Your Ideal Client
 
Mordy Oberstein — Wix for SEO: Separating Fact from Fiction
Mordy Oberstein — Wix for SEO: Separating Fact from FictionMordy Oberstein — Wix for SEO: Separating Fact from Fiction
Mordy Oberstein — Wix for SEO: Separating Fact from Fiction
 
Daniel Liddle — 5 Practical Ways to Implement NLP in Your SEO Strategy
Daniel Liddle — 5 Practical Ways to Implement NLP in Your SEO StrategyDaniel Liddle — 5 Practical Ways to Implement NLP in Your SEO Strategy
Daniel Liddle — 5 Practical Ways to Implement NLP in Your SEO Strategy
 
Mordy Oberstein — Room for the Average Site? The SERP, Super-Authorities, & Y...
Mordy Oberstein — Room for the Average Site? The SERP, Super-Authorities, & Y...Mordy Oberstein — Room for the Average Site? The SERP, Super-Authorities, & Y...
Mordy Oberstein — Room for the Average Site? The SERP, Super-Authorities, & Y...
 
AJ Ghergich — How Anyone Can Win Featured Snippets — Backed by Data Science
AJ Ghergich — How Anyone Can Win Featured Snippets —  Backed by Data ScienceAJ Ghergich — How Anyone Can Win Featured Snippets —  Backed by Data Science
AJ Ghergich — How Anyone Can Win Featured Snippets — Backed by Data Science
 
Marketing Channels: The Digital Marketing Trends for 2021
Marketing Channels: The Digital Marketing Trends for 2021Marketing Channels: The Digital Marketing Trends for 2021
Marketing Channels: The Digital Marketing Trends for 2021
 
Jono Alderson — Structured Data: Schema Changes and Updates
Jono Alderson — Structured Data: Schema Changes and UpdatesJono Alderson — Structured Data: Schema Changes and Updates
Jono Alderson — Structured Data: Schema Changes and Updates
 
Martha van Berkel — Structured Data: Schema Changes and Updates
Martha van Berkel — Structured Data: Schema Changes and UpdatesMartha van Berkel — Structured Data: Schema Changes and Updates
Martha van Berkel — Structured Data: Schema Changes and Updates
 

Recently uploaded

The+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdfThe+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdfSocial Samosa
 
Call Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCRCall Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCRSapana Sha
 
BDSM⚡Call Girls in Sector 39 Noida Escorts Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 39 Noida Escorts Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 39 Noida Escorts Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 39 Noida Escorts Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesGoogle 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesSearch Engine Journal
 
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxUnraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxelizabethella096
 
Cash payment girl 9257726604 Hand ✋ to Hand over girl
Cash payment girl 9257726604 Hand ✋ to Hand over girlCash payment girl 9257726604 Hand ✋ to Hand over girl
Cash payment girl 9257726604 Hand ✋ to Hand over girlCall girl Jaipur
 
personal branding kit for music business
personal branding kit for music businesspersonal branding kit for music business
personal branding kit for music businessbrjohnson6
 
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...ChesterYang6
 
BDSM⚡Call Girls in Sector 144 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 144 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 144 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 144 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Labour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptxLabour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptxelizabethella096
 
Defining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotlerDefining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotlerAmirNasiruog
 
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Onlineanilsa9823
 
Branding strategies of new company .pptx
Branding strategies of new company .pptxBranding strategies of new company .pptx
Branding strategies of new company .pptxVikasTiwari846641
 
The Science of Landing Page Messaging.pdf
The Science of Landing Page Messaging.pdfThe Science of Landing Page Messaging.pdf
The Science of Landing Page Messaging.pdfVWO
 

Recently uploaded (20)

The+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdfThe+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdf
 
Call Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCRCall Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCR
 
BDSM⚡Call Girls in Sector 39 Noida Escorts Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 39 Noida Escorts Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 39 Noida Escorts Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 39 Noida Escorts Escorts >༒8448380779 Escort Service
 
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesGoogle 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
 
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxUnraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
 
Cash payment girl 9257726604 Hand ✋ to Hand over girl
Cash payment girl 9257726604 Hand ✋ to Hand over girlCash payment girl 9257726604 Hand ✋ to Hand over girl
Cash payment girl 9257726604 Hand ✋ to Hand over girl
 
personal branding kit for music business
personal branding kit for music businesspersonal branding kit for music business
personal branding kit for music business
 
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
 
Foundation First - Why Your Website and Content Matters - David Pisarek
Foundation First - Why Your Website and Content Matters - David PisarekFoundation First - Why Your Website and Content Matters - David Pisarek
Foundation First - Why Your Website and Content Matters - David Pisarek
 
The Future of Brands on LinkedIn - Alison Kaltman
The Future of Brands on LinkedIn - Alison KaltmanThe Future of Brands on LinkedIn - Alison Kaltman
The Future of Brands on LinkedIn - Alison Kaltman
 
BDSM⚡Call Girls in Sector 144 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 144 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 144 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 144 Noida Escorts >༒8448380779 Escort Service
 
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
 
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
 
Labour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptxLabour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptx
 
Defining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotlerDefining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotler
 
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
 
Branding strategies of new company .pptx
Branding strategies of new company .pptxBranding strategies of new company .pptx
Branding strategies of new company .pptx
 
Brand Strategy Master Class - Juntae DeLane
Brand Strategy Master Class - Juntae DeLaneBrand Strategy Master Class - Juntae DeLane
Brand Strategy Master Class - Juntae DeLane
 
The Science of Landing Page Messaging.pdf
The Science of Landing Page Messaging.pdfThe Science of Landing Page Messaging.pdf
The Science of Landing Page Messaging.pdf
 
How to Create a Social Media Plan Like a Pro - Jordan Scheltgen
How to Create a Social Media Plan Like a Pro - Jordan ScheltgenHow to Create a Social Media Plan Like a Pro - Jordan Scheltgen
How to Create a Social Media Plan Like a Pro - Jordan Scheltgen
 

How Search Engines Manage Too Much Web Content

  • 1. Dawn  Anderson  @  dawnieando
  • 2. Indexed  Web  contains at  least  4.73  billion   pages (13/11/2015) 05 TOO MUCH CONTENT Total  number  of  websites 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 1,000,000,000 750,000,000 500,000,000 250,000,000 SINCE  2013  THE  WEB  IS   THOUGHT  TO  HAVE   INCREASED  IN  SIZE  BY  1/3
  • 3. Capacity  limits   on  Google’s   crawling  system By  prioritising   URLs  for   crawling By  assigning   crawl  period   intervals  to  URLs How  have   search  engines   responded? By  creating  work   ‘schedules’  for   Googlebots 06 TOO MUCH CONTENT
  • 4. 9  types  of   Googlebot THE KEY PERSONAS 02 SUPPORTING  ROLES Indexer  /   Ranking  Engine The  URL   Scheduler History  Logs Link  Logs Anchor  Logs LOOKING  AT  ‘PAST  DATA’
  • 5. ‘Ranks  nothing  at  all’ Takes  a  list  of  URLs  to  crawl  from  URL  Scheduler Job  varies  based  on  ‘bot’  type Runs  errands  &  makes  deliveries  for  the  URL  server,   indexer  /  ranking  engine  and  logs Makes  notes  of  outbound   linked  pages  and  additional   links  for  future  crawling Takes  notes  of  ‘hints’  from  URL  scheduler  when  crawling Tells  tales  of  URL  accessibility  status,  server  response   codes,  notes  relationships  between  links  and  collects   content  checksums  (binary  data  equivalent  of  web   content)  for  comparison  with  past  visits  by  history  and   link  logs 03 GOOGLEBOT’S JOBS
  • 6. 04 ROLES – MAJOR PLAYERS – A ‘BOSS’- URL SCHEDULER Think  of  it  as  Google’s   line  manager  or  ‘air   traffic  controller’  for   Googlebots in  the   web  crawling  system Schedules  Googlebot visits  to  URLs Decides  which  URLs  to  ‘feed’  to  Googlebot Uses  data  from  the  history  logs  about  past  visits Assigns  visit  regularity  of  Googlebot to  URLs Drops  ‘hints’  to  Googlebot to  guide  on  types  of  content  NOT  to   crawl  and  excludes  some  URLs  from  schedules Analyses  past  ‘change’  periods  and  predicts  future  ‘change’   periods  for  URLs  for  the  purposes  of  scheduling  Googlebot visits Checks  ‘page  importance’  in  scheduling  visits Assigns  URLs  to  ‘layers  /  tiers’  for  crawling  schedules
  • 7. Scheduler  checks  URLs   for  ‘importance’,  ‘boost   factor’  candidacy,   ‘probability  of   modification’ GOOGLEBOT’S BEEN PUT ON A URL CONTROLLED DIET 09 The  URL  Scheduler   controls  the  meal   planner Carefully  controls   the  list  of  URLs   Googlebot vits ‘Budgets’  are  allocated £
  • 8. CRAWL BUDGET – WHAT IS IT? 10 Roughly  proportionate  to  Page  Importance  (LinkEquity)   &  speed Pages  with  a  lot  of  healthy  links  get  crawled  more  (Can  include  internal  links??) Apportioned  by  the  URL  scheduler  to  Googlebots WHAT  IS  A  CRAWL  BUDGET?  -­‐ An  allocation  of  ‘crawl  visit  frequency’  apportioned  to  URLs  on  a  site But  there  are  other  factors  affecting  frequency  of  Googlebot visits  aside  from  importance  /  speed The  vast  majority  of  URLs  on  the  web  don’t  get  a  lot  of  budget  allocated  to  them
  • 9. Current  capacity  of  the  web  crawling  system  is  high Your  URL  is  ‘important’ Your  URL  changes  a  lot  with  critical  material  content   change Probability  and  predictability  of  critical  material  content   change  is  high  for  your  URL Your  website  speed  is  fast  and  Googlebot gets  the  time  to   visit  your  URL Your  URL  has  been  ‘upgraded’  to  a  daily  or  real  time  crawl   layer 12 POSITIVE FACTORS AFFECTING GOOGLEBOT VISIT FREQUENCY
  • 10. Current  capacity  of  web  crawling  system  is  low Your  URL  has  been  detected  as  a  ‘spam’  URL Your  URL  is  in  an  ‘inactive’  base  layer  segment Your  URLs  are  ‘tripping  hints’  built  into  the  system  to   detect  non-­‐critical  change  dynamic  content Probability  and  predictability  of  critical  material  content   change  is  low  for  your  URL Your  website  speed  is  slow  and  Googlebot doesn’t  get  the   time  to  visit  your  URL Your  URL  has  been  ‘downgraded’  to  an  ‘inactive’  base   layer  segment Your  URL  has  returned  an  ‘unreachable’  server  response   code  recently 13 NEGATIVE FACTORS AFFECTING GOOGLEBOT VISIT FREQUENCY
  • 11. FIND GOOGLEBOT 16 AUTOMATE  SERVER  LOG   RETRIEVAL  VIA  CRON  JOB grep Googlebot access_log >googlebot_access.txt
  • 12. LOOK THROUGH ‘SPIDER EYES’ VIA LOG ANALYSIS – ANALYSE GOOGLEBOT 17 PREPARE TO BE HORRIFIED Incorrect  URL  header  response  codes  (e.g.  302s) 301  redirect  chains Old  files  or  XML  sitemaps  left  on  server  from  years  ago Infinite/  endless  loops  (circular  dependency) On  parameter  driven  sites  URLs  crawled  which  produce  same  output URLs  generated  by  spammers Dead  image  files  being  visited Old  CSS  files  still  being  crawled  and  loading  legacy  images  e.g.
  • 13. SEARCH ENGINE VIEW EMULATOR 11 http://www.ovrdrv.com/search_view Lynx  Browser  -­‐ 4  options   to  view   through   search  engine  eyes,   human  eyes,  page  source  or   page  anlysis
  • 14. 21 LOOK THROUGH ‘SPIDER EYES’ • GSC  Crawl  Stats • Google  Search  Console  (all  tools) • Deepcrawl • Screaming  Frog • Server  Log  Analysis • SEMRush (auditing  tools) • Webconfs (header  responses  /   similarity  checker) • Powermapper (birds  eye  view  of  site) • Search  Engine  View  Emulator
  • 15. 18 FIX GOOGLEBOT’S JOURNEY SPEED UP YOUR SITE TO ‘FEED’ GOOGLEGOT MORE TECHNICAL  ‘FIXES’       Speed  up  your  site Implement  compression,  minification,  caching ‘ Fix  incorrect  header  response  codes Fix  nonsensical  ‘infinite  loops’  generated  by   database  driven  parameters  or  ‘looping’  relative   URLs Use  absolute  versus  relative  internal  links Ensure  no  parts  of  content  is  blocked  from   crawlers  (e.g.  in  carousels,  concertinas  and   tabbed  content Ensure  no  css or  javascript files  are  blocked  from   crawlers Unpick  301  redirect  chains
  • 16. 21 SPEED TOOLS SPEED• Yslow • Pingdom • Google  Page  Speed  Tests • Minificiation – JS  Compress  and  CSS   Minifier • Image  Compression  – Compressjpeg.com,  tinypng.com
  • 17. 21 URL IMPORTANCE TOOLS URL  IMPORTANCE • GSC  Internal  links  Report  (URL   importance) • Link  Research  Tools  (Strongest  sub   pages  reports) • GSC  Internal  links  (add  site  categories   and  sections  as  additional  profiles) • Powermapper
  • 18. STOP YOURSELF ‘VOTING’ FOR THE WRONG INTERNAL LINKS IN YOUR SITE 22 ‘IT CANNOT BE EMPHASISED ENOUGH HOW IMPORTANT IT IS TO EMPHASISE IMPORTANCE’ Most Important Page 1 Most  Important  Page  2 Most  Important  Page  3
  • 19. ONLINE DEMO OF XML GENERATOR 11 https://www.xml-­‐ sitemaps.com/gen erator-­‐demo/ https://www.xml-­‐ sitemaps.com/generator-­‐demo/
  • 20. 1. Use  XML  sitemaps 2. Add  site  sections  (e.g.  categories)  as  profiles  in  Google  Search  Console   for  more  granularity 3. Keep  301  redirections  to  a  minimum 4. Use  regular  expressions   on  .htaccess files  to  implement  rules  and  reduce  crawl  lag 5. Look  out  for  redirect  chains 6. Look  out  for  infinite  loops  (spider  traps) 7. Check  URL  parameters  in  Google  Search  Console 8. Check  if  URLs  return  the  exact  same  content  and  choose  one  as  the  preferred  URL 9. Block  or  canonicalise duplicate  content 10. Use  absolute  versus  relative  URLs 11. Improve  site  speed 12. Use  front  facing  HTML  sitemaps  for  important  pages 13. Use  noindex on  pages  which  add  no  value  but  may  be  useful  for  visitors  to  traverse  your  site 14. Use  ‘if  modified’  headers  to  keep  Googlebot out  of  low  importance  pages 15. Build  server  log  analysis  into  your  regular  SEO  activities 03 15 THINGS YOU CAN DO
  • 21. ”WHEN  GOOGLEBOT  PLAYS  ‘SUPERMARKET  SWEEP’  YOU  WANT  TO  FILL  THE   SHOPPING  TROLLEY  WITH  LUXURY  ITEMS” Dawn  Anderson  @  dawnieando REMEMBER