SlideShare a Scribd company logo
1 of 26
Checking Google Index status at scale with Node.js
Checking
Google Index status
at scale with Node.js
Jose Luis Hernando
@jlhernando #BrightonSEO
Senior Technical SEO Consultant
Checking Google Index status at scale with Node.js
Today’s agenda
1. Why it’s important to know your website’s indexing status
2. The challenge to extract this data
3. Getting the data with Node.js – Live Demo!
4. Using this data for your SEO strategy
Checking Google Index status at scale with Node.js
Why is it important?
Reason #1
Not in the Index => Not in the SERPs
Icons from Google, Flaticon & Sitecheckerpro
Checking Google Index status at scale with Node.js
Why is it important?
Reason #2
Google evaluates site quality based on indexed pages
Sources:
Google Only Can Judge Site Quality Based On Pages They Index – Barry Swartz (Search Engine Roundtable)
English Google Webmaster Central office-hours hangout – Google Webmasters YouTube Channel
Low Quality Pages
Uncontrolled Faceted Navigation URLs
Unsupervised User Generated Content
Indexable Non-Canonical URLs
High Quality Pages
Category Pages
Editorial Pages
Canonical Product Pages
+
Checking Google Index status at scale with Node.js
Why is it important?
Reason #3
Inefficient use of Google’s resources
https://website.com/category-one/
HTML CSS JS
/category-one/?color=red
/category-one/?color=blue
/category-one/?color=red&blue
…
∞
Checking Google Index status at scale with Node.js
71.7%
54.3%
41.7%
34.4%
45.3%
30.2%
15.1%
10.1%
1-10k
10k-100k
100k-1M
1M+
Avg. Crawl Ratio (%) Avg. Active Ratio (%)
Source: How Does Google Crawl the Web? – (Annabelle Bouard & Dimitri Brunel – Botify)
Crawl Ratio
Percentage of pages
crawled by Google in 30 days
Active Ratio
Percentage of pages that
have generated at least
one organic visit in 30 days.
How much of your site is Googlebot crawling?
Checking Google Index status at scale with Node.js
The challenge
to extract this data
• Googlebot’s crawling behaviour
doesn’t determine indexing status
Checking Google Index status at scale with Node.js
The challenge:
extracting this data
• Googlebot’s crawling behaviour
doesn’t determine indexing status
• You rely on partial and sometimes
inaccurate data points:
• site: & inurl: operators
• GSC Indexing reports:
• URL Inspection Tool (< 200 URLs /day)
• Coverage Reports (< 1,000 rows /
report)
Checking Google Index status at scale with Node.js
Proxy metrics != Accurate data
Checking Google Index status at scale with Node.js
If you can’t find it, build it
Checking Google Index status at scale with Node.js
{Live demo}
bit.ly/google-index-checker-script
Checking Google Index status at scale with Node.js
Using the following method
goes against Google’s Terms of Service
as it automatically requests search queries from Google Search
Quick FYI
Checking Google Index status at scale with Node.js
Our script outperforms every other method available
Checking Google Index status at scale with Node.js
How can you use Google index
data?
Identify inefficient
use of crawl budget
Error Prioritisation
Identify holes
in your
architecture
Check for pages from your
site that should be indexed
but are not.
Find pages that should not be
indexed but are indexed.
Detect pages that used to
exist and now return an error
(4xx) but are still indexed.
Checking Google Index status at scale with Node.js
Use case #1
Sitemap Health Check
How many URLs from your XML sitemap are
indexed?
• 200 Status Code – 81,688
Inspired by Data Secrets of the Index Coverage Report – AJ Kohn
Sitemaps = 111,772
URLs
80% Indexed 74,223
7,465
Google Index Status of 2xx URLs
from Sitemap
Indexed Not Indexed
Checking Google Index status at scale with Node.js
Use case #1
Sitemap Health Check
How many URLs from your XML sitemap are
indexed?
• 200 Status Code – 81,688
• 404 Status Code – 29,969
Inspired by Data Secrets of the Index Coverage Report – AJ Kohn
Sitemaps = 111,772
URLs
80% Indexed
21% Indexed
6,268
23,701
Google Index Status of 4xx URLs
from Sitemap
Indexed Not Indexed
Checking Google Index status at scale with Node.js
Use case #1
Sitemap Health Check
How many URLs from your XML sitemap are
indexed?
• 200 Status Code – 81,688
• 404 Status Code – 29,969
• 301 Status Code – 365
Inspired by Data Secrets of the Index Coverage Report – AJ Kohn
Sitemaps = 111,772
URLs
80% Indexed
21% Indexed
4% Indexed
16 349
Google Index Status of 3xx URLs
from Sitemap
Indexed Not Indexed
Checking Google Index status at scale with Node.js
Sitemap Health Check
Next Steps
1) Identify if these URLs are important to your site’s bottom line
2) Check if a pool of these URLs have issues on GSC’s
Index Coverage Report
3) Choose a tactic to improve the visibility of these URLs
4) Isolate the relevant URLs and modify the existing sitemap or create a
new-sitemap.xml to monitor progress
Checking Google Index status at scale with Node.js
Use case #2
Log File Analysis Plus+
How many URLs with Googlebot hits are
indexed?
• ~160k Googlebot hits to non-canonical URLs
(/Uppercase/ vs /lowercase/)
• Identified if non-canonical URLs were indexed
• Identified if the referenced canonical URLs
were indexed
35.8%
64.2%
Indexed Non-Canonical URLs
Requested by Googlebot
Indexed Not Indexed
Undisclosed Client
Checking Google Index status at scale with Node.js
Log File Analysis+
Next Steps
1) Identify if the canonical tag is correctly placed
2) Identify if the root cause is internal linking, external linking or other
3) Consider redirecting non-canonical URLs to canonical URLs
4) Create a new-sitemap.xml with problematic URLs to encourage
Googlebot revisiting those URLs and for monitoring purposes
Checking Google Index status at scale with Node.js
• Check Real-time indexing (News sites, Offer sites, Job Boards)
• Check uncontrolled faceted navigation (Crawl budget optimisation)
• Check inactive product/category URLs – (Site architecture
improvements)
• Check old 4xx that are live now & haven't been deindexed yet (Recover
organic opportunities)
Other use cases
Inform your SEO strategy
Checking Google Index status at scale with Node.js
Further reading
https://bit.ly/google-index-checks
Checking Google Index status at scale with Node.js
Further reading
https://bit.ly/gsc-index-coverage
Checking Google Index status at scale with Node.js
The Google Index Checker script has opened a door
to get useful, actionable data at scale for your sites
Use it, and act on it.
Checking Google Index status at scale with Node.js
Thank you.
builtvisible.com
Jose Luis Hernando
Senior Technical SEO Consultant
@jlhernando
Checking Google Index status at scale with Node.js
How does Google crawl the web – Annabelle Bouard & Dimitri Brunel (Botify)
English Google Webmaster Central office-hours hangout – Google Webmasters YouTube Channel
Google Only Can Judge Site Quality Based On Pages They Index – Barry Swartz (Search Engine Roundtable)
Data Secrets of the Index Coverage Report - Blind Five Year Old (AJ Kohn)
How Google Search Works – Google Documentation
How Search organises information – Google Documentation
Our new search index: Caffeine - Carrie Grimes
When indexing goes wrong: how Google Search recovered from indexing issues & lessons learned since -
Vincent Courson, Google Search Outreach
How Search Engines Work: Crawling, Indexing & Ranking – Moz
(Please) Stop Using Unsafe Characters in URLs – Jeff Starr
Sources & additional reading

More Related Content

What's hot

Advanced data-driven technical SEO - SMX London 2019
Advanced data-driven technical SEO - SMX London 2019Advanced data-driven technical SEO - SMX London 2019
Advanced data-driven technical SEO - SMX London 2019Bastian Grimm
 
How to Perform SEO Audits
How to Perform SEO AuditsHow to Perform SEO Audits
How to Perform SEO Auditsalanbleiweiss
 
BrightonSEO 2017 - SEO quick wins from a technical check
BrightonSEO 2017  - SEO quick wins from a technical checkBrightonSEO 2017  - SEO quick wins from a technical check
BrightonSEO 2017 - SEO quick wins from a technical checkChloe Bodard
 
Advanced SEO Ranking Relationships
Advanced SEO Ranking RelationshipsAdvanced SEO Ranking Relationships
Advanced SEO Ranking Relationshipsalanbleiweiss
 
How to repurpose your content in 2016
How to repurpose your content in 2016How to repurpose your content in 2016
How to repurpose your content in 2016Joseph Rega
 
Technical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing Club
Technical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing ClubTechnical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing Club
Technical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing ClubBill Hartzer
 
Redefining relevance: links in 2018 - #LeedsLovesSearch
Redefining relevance: links in 2018 - #LeedsLovesSearchRedefining relevance: links in 2018 - #LeedsLovesSearch
Redefining relevance: links in 2018 - #LeedsLovesSearchBranded3
 
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS MeetupReact JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetuppatrickstox
 
Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...
Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...
Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...patrickstox
 
Technical SEO Competitive Analysis - BrightonSEO 2020
Technical SEO Competitive Analysis - BrightonSEO 2020Technical SEO Competitive Analysis - BrightonSEO 2020
Technical SEO Competitive Analysis - BrightonSEO 2020Niki Mosier
 
Proactive Measures for Good Site Health - Brighton SEO 2014
Proactive Measures for Good Site Health - Brighton SEO 2014Proactive Measures for Good Site Health - Brighton SEO 2014
Proactive Measures for Good Site Health - Brighton SEO 2014Thomas Whittam
 
Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...
Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...
Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...BarbaraGacaTworek
 
SMX East - SEO Tools Panel
SMX East - SEO Tools PanelSMX East - SEO Tools Panel
SMX East - SEO Tools PanelAbby Hamilton
 
What's Next for Page Experience - SMX Next 2021 - Patrick Stox
What's Next for Page Experience - SMX Next 2021 - Patrick StoxWhat's Next for Page Experience - SMX Next 2021 - Patrick Stox
What's Next for Page Experience - SMX Next 2021 - Patrick StoxAhrefs
 
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016Dawn Anderson MSc DigM
 
the SEO cyborg - Moz 2018 (full edition)
the SEO cyborg - Moz 2018 (full edition)the SEO cyborg - Moz 2018 (full edition)
the SEO cyborg - Moz 2018 (full edition)Alexis Sanders
 
FoundConf 2018 Signals Speak - Alexis Sanders
FoundConf 2018 Signals Speak - Alexis SandersFoundConf 2018 Signals Speak - Alexis Sanders
FoundConf 2018 Signals Speak - Alexis SandersAlexis Sanders
 
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019patrickstox
 
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...Jamie Indigo
 

What's hot (19)

Advanced data-driven technical SEO - SMX London 2019
Advanced data-driven technical SEO - SMX London 2019Advanced data-driven technical SEO - SMX London 2019
Advanced data-driven technical SEO - SMX London 2019
 
How to Perform SEO Audits
How to Perform SEO AuditsHow to Perform SEO Audits
How to Perform SEO Audits
 
BrightonSEO 2017 - SEO quick wins from a technical check
BrightonSEO 2017  - SEO quick wins from a technical checkBrightonSEO 2017  - SEO quick wins from a technical check
BrightonSEO 2017 - SEO quick wins from a technical check
 
Advanced SEO Ranking Relationships
Advanced SEO Ranking RelationshipsAdvanced SEO Ranking Relationships
Advanced SEO Ranking Relationships
 
How to repurpose your content in 2016
How to repurpose your content in 2016How to repurpose your content in 2016
How to repurpose your content in 2016
 
Technical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing Club
Technical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing ClubTechnical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing Club
Technical SEO Audits - SEO Consultant Bill Hartzer - Triangle Marketing Club
 
Redefining relevance: links in 2018 - #LeedsLovesSearch
Redefining relevance: links in 2018 - #LeedsLovesSearchRedefining relevance: links in 2018 - #LeedsLovesSearch
Redefining relevance: links in 2018 - #LeedsLovesSearch
 
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS MeetupReact JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
 
Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...
Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...
Things Google Tries To Correct For You - SMX Advanced 2019 Insights Sessions ...
 
Technical SEO Competitive Analysis - BrightonSEO 2020
Technical SEO Competitive Analysis - BrightonSEO 2020Technical SEO Competitive Analysis - BrightonSEO 2020
Technical SEO Competitive Analysis - BrightonSEO 2020
 
Proactive Measures for Good Site Health - Brighton SEO 2014
Proactive Measures for Good Site Health - Brighton SEO 2014Proactive Measures for Good Site Health - Brighton SEO 2014
Proactive Measures for Good Site Health - Brighton SEO 2014
 
Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...
Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...
Gaca-Tworek: JavaScript analysis is extremely important and anyone can do it!...
 
SMX East - SEO Tools Panel
SMX East - SEO Tools PanelSMX East - SEO Tools Panel
SMX East - SEO Tools Panel
 
What's Next for Page Experience - SMX Next 2021 - Patrick Stox
What's Next for Page Experience - SMX Next 2021 - Patrick StoxWhat's Next for Page Experience - SMX Next 2021 - Patrick Stox
What's Next for Page Experience - SMX Next 2021 - Patrick Stox
 
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
 
the SEO cyborg - Moz 2018 (full edition)
the SEO cyborg - Moz 2018 (full edition)the SEO cyborg - Moz 2018 (full edition)
the SEO cyborg - Moz 2018 (full edition)
 
FoundConf 2018 Signals Speak - Alexis Sanders
FoundConf 2018 Signals Speak - Alexis SandersFoundConf 2018 Signals Speak - Alexis Sanders
FoundConf 2018 Signals Speak - Alexis Sanders
 
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
 
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
 

Similar to Checking Google Index Status at Scale using Node.js - Jose Hernando - BrightonSEO Oct 2020

Evaluating URLs at Scale
Evaluating URLs at ScaleEvaluating URLs at Scale
Evaluating URLs at ScaleBristolSEO
 
SEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideSEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideAdam Audette
 
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-PractiseTechnical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-PractiseErudite
 
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag ManagerPaul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag ManagerJulia Grosman
 
Crawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to KnowCrawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to KnowSallyR7
 
What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?riteshhsociall
 
33 Tactics to Engage and Retain More Customers - IRCE 2016
33 Tactics to Engage and Retain More Customers - IRCE 201633 Tactics to Engage and Retain More Customers - IRCE 2016
33 Tactics to Engage and Retain More Customers - IRCE 2016Mark Ginsberg
 
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUKeeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUJason Mun
 
Site Migrations by Nik Ranger
 Site Migrations by Nik Ranger Site Migrations by Nik Ranger
Site Migrations by Nik RangerAnton Shulke
 
SEO Audit Workshop : Frameworks , Techniques and Tools
SEO Audit Workshop : Frameworks , Techniques and Tools SEO Audit Workshop : Frameworks , Techniques and Tools
SEO Audit Workshop : Frameworks , Techniques and Tools NEW MEDIA GURU
 
33 Tactics to Engage and Retain More Customers- IRCE 2016
33 Tactics to Engage and Retain More Customers- IRCE 201633 Tactics to Engage and Retain More Customers- IRCE 2016
33 Tactics to Engage and Retain More Customers- IRCE 2016Andrew Scarbrough
 
Raven Tools for Reporting, Analysis & Strategy Development
Raven Tools for Reporting, Analysis & Strategy DevelopmentRaven Tools for Reporting, Analysis & Strategy Development
Raven Tools for Reporting, Analysis & Strategy DevelopmentBrettASnyder
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptxScrbifPt
 
Basic Search Engine Optimization Strategies
Basic Search Engine Optimization Strategies  Basic Search Engine Optimization Strategies
Basic Search Engine Optimization Strategies Online Business Owners
 
Faceted Navigation: (Almost) Everyone is Doing it Wrong
Faceted Navigation: (Almost) Everyone is Doing it WrongFaceted Navigation: (Almost) Everyone is Doing it Wrong
Faceted Navigation: (Almost) Everyone is Doing it WrongBotify
 
Implementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMImplementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMrtpaem
 
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning Catalyst
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning CatalystIntroduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning Catalyst
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning CatalystLearning-Catalyst
 

Similar to Checking Google Index Status at Scale using Node.js - Jose Hernando - BrightonSEO Oct 2020 (20)

Evaluating URLs at Scale
Evaluating URLs at ScaleEvaluating URLs at Scale
Evaluating URLs at Scale
 
SEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideSEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive Guide
 
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-PractiseTechnical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
 
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag ManagerPaul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
 
Crawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to KnowCrawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to Know
 
What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?
 
33 Tactics to Engage and Retain More Customers - IRCE 2016
33 Tactics to Engage and Retain More Customers - IRCE 201633 Tactics to Engage and Retain More Customers - IRCE 2016
33 Tactics to Engage and Retain More Customers - IRCE 2016
 
Site Analysis
Site AnalysisSite Analysis
Site Analysis
 
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUKeeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
 
Site Migrations by Nik Ranger
 Site Migrations by Nik Ranger Site Migrations by Nik Ranger
Site Migrations by Nik Ranger
 
SEO Audit Workshop : Frameworks , Techniques and Tools
SEO Audit Workshop : Frameworks , Techniques and Tools SEO Audit Workshop : Frameworks , Techniques and Tools
SEO Audit Workshop : Frameworks , Techniques and Tools
 
33 Tactics to Engage and Retain More Customers- IRCE 2016
33 Tactics to Engage and Retain More Customers- IRCE 201633 Tactics to Engage and Retain More Customers- IRCE 2016
33 Tactics to Engage and Retain More Customers- IRCE 2016
 
Raven Tools for Reporting, Analysis & Strategy Development
Raven Tools for Reporting, Analysis & Strategy DevelopmentRaven Tools for Reporting, Analysis & Strategy Development
Raven Tools for Reporting, Analysis & Strategy Development
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptx
 
Dc seo fin
Dc seo finDc seo fin
Dc seo fin
 
Basic Search Engine Optimization Strategies
Basic Search Engine Optimization Strategies  Basic Search Engine Optimization Strategies
Basic Search Engine Optimization Strategies
 
Faceted Navigation: (Almost) Everyone is Doing it Wrong
Faceted Navigation: (Almost) Everyone is Doing it WrongFaceted Navigation: (Almost) Everyone is Doing it Wrong
Faceted Navigation: (Almost) Everyone is Doing it Wrong
 
Implementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMImplementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEM
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning Catalyst
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning CatalystIntroduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning Catalyst
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning Catalyst
 

Recently uploaded

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 

Recently uploaded (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

Checking Google Index Status at Scale using Node.js - Jose Hernando - BrightonSEO Oct 2020

  • 1. Checking Google Index status at scale with Node.js Checking Google Index status at scale with Node.js Jose Luis Hernando @jlhernando #BrightonSEO Senior Technical SEO Consultant
  • 2. Checking Google Index status at scale with Node.js Today’s agenda 1. Why it’s important to know your website’s indexing status 2. The challenge to extract this data 3. Getting the data with Node.js – Live Demo! 4. Using this data for your SEO strategy
  • 3. Checking Google Index status at scale with Node.js Why is it important? Reason #1 Not in the Index => Not in the SERPs Icons from Google, Flaticon & Sitecheckerpro
  • 4. Checking Google Index status at scale with Node.js Why is it important? Reason #2 Google evaluates site quality based on indexed pages Sources: Google Only Can Judge Site Quality Based On Pages They Index – Barry Swartz (Search Engine Roundtable) English Google Webmaster Central office-hours hangout – Google Webmasters YouTube Channel Low Quality Pages Uncontrolled Faceted Navigation URLs Unsupervised User Generated Content Indexable Non-Canonical URLs High Quality Pages Category Pages Editorial Pages Canonical Product Pages +
  • 5. Checking Google Index status at scale with Node.js Why is it important? Reason #3 Inefficient use of Google’s resources https://website.com/category-one/ HTML CSS JS /category-one/?color=red /category-one/?color=blue /category-one/?color=red&blue … ∞
  • 6. Checking Google Index status at scale with Node.js 71.7% 54.3% 41.7% 34.4% 45.3% 30.2% 15.1% 10.1% 1-10k 10k-100k 100k-1M 1M+ Avg. Crawl Ratio (%) Avg. Active Ratio (%) Source: How Does Google Crawl the Web? – (Annabelle Bouard & Dimitri Brunel – Botify) Crawl Ratio Percentage of pages crawled by Google in 30 days Active Ratio Percentage of pages that have generated at least one organic visit in 30 days. How much of your site is Googlebot crawling?
  • 7. Checking Google Index status at scale with Node.js The challenge to extract this data • Googlebot’s crawling behaviour doesn’t determine indexing status
  • 8. Checking Google Index status at scale with Node.js The challenge: extracting this data • Googlebot’s crawling behaviour doesn’t determine indexing status • You rely on partial and sometimes inaccurate data points: • site: & inurl: operators • GSC Indexing reports: • URL Inspection Tool (< 200 URLs /day) • Coverage Reports (< 1,000 rows / report)
  • 9. Checking Google Index status at scale with Node.js Proxy metrics != Accurate data
  • 10. Checking Google Index status at scale with Node.js If you can’t find it, build it
  • 11. Checking Google Index status at scale with Node.js {Live demo} bit.ly/google-index-checker-script
  • 12. Checking Google Index status at scale with Node.js Using the following method goes against Google’s Terms of Service as it automatically requests search queries from Google Search Quick FYI
  • 13. Checking Google Index status at scale with Node.js Our script outperforms every other method available
  • 14. Checking Google Index status at scale with Node.js How can you use Google index data? Identify inefficient use of crawl budget Error Prioritisation Identify holes in your architecture Check for pages from your site that should be indexed but are not. Find pages that should not be indexed but are indexed. Detect pages that used to exist and now return an error (4xx) but are still indexed.
  • 15. Checking Google Index status at scale with Node.js Use case #1 Sitemap Health Check How many URLs from your XML sitemap are indexed? • 200 Status Code – 81,688 Inspired by Data Secrets of the Index Coverage Report – AJ Kohn Sitemaps = 111,772 URLs 80% Indexed 74,223 7,465 Google Index Status of 2xx URLs from Sitemap Indexed Not Indexed
  • 16. Checking Google Index status at scale with Node.js Use case #1 Sitemap Health Check How many URLs from your XML sitemap are indexed? • 200 Status Code – 81,688 • 404 Status Code – 29,969 Inspired by Data Secrets of the Index Coverage Report – AJ Kohn Sitemaps = 111,772 URLs 80% Indexed 21% Indexed 6,268 23,701 Google Index Status of 4xx URLs from Sitemap Indexed Not Indexed
  • 17. Checking Google Index status at scale with Node.js Use case #1 Sitemap Health Check How many URLs from your XML sitemap are indexed? • 200 Status Code – 81,688 • 404 Status Code – 29,969 • 301 Status Code – 365 Inspired by Data Secrets of the Index Coverage Report – AJ Kohn Sitemaps = 111,772 URLs 80% Indexed 21% Indexed 4% Indexed 16 349 Google Index Status of 3xx URLs from Sitemap Indexed Not Indexed
  • 18. Checking Google Index status at scale with Node.js Sitemap Health Check Next Steps 1) Identify if these URLs are important to your site’s bottom line 2) Check if a pool of these URLs have issues on GSC’s Index Coverage Report 3) Choose a tactic to improve the visibility of these URLs 4) Isolate the relevant URLs and modify the existing sitemap or create a new-sitemap.xml to monitor progress
  • 19. Checking Google Index status at scale with Node.js Use case #2 Log File Analysis Plus+ How many URLs with Googlebot hits are indexed? • ~160k Googlebot hits to non-canonical URLs (/Uppercase/ vs /lowercase/) • Identified if non-canonical URLs were indexed • Identified if the referenced canonical URLs were indexed 35.8% 64.2% Indexed Non-Canonical URLs Requested by Googlebot Indexed Not Indexed Undisclosed Client
  • 20. Checking Google Index status at scale with Node.js Log File Analysis+ Next Steps 1) Identify if the canonical tag is correctly placed 2) Identify if the root cause is internal linking, external linking or other 3) Consider redirecting non-canonical URLs to canonical URLs 4) Create a new-sitemap.xml with problematic URLs to encourage Googlebot revisiting those URLs and for monitoring purposes
  • 21. Checking Google Index status at scale with Node.js • Check Real-time indexing (News sites, Offer sites, Job Boards) • Check uncontrolled faceted navigation (Crawl budget optimisation) • Check inactive product/category URLs – (Site architecture improvements) • Check old 4xx that are live now & haven't been deindexed yet (Recover organic opportunities) Other use cases Inform your SEO strategy
  • 22. Checking Google Index status at scale with Node.js Further reading https://bit.ly/google-index-checks
  • 23. Checking Google Index status at scale with Node.js Further reading https://bit.ly/gsc-index-coverage
  • 24. Checking Google Index status at scale with Node.js The Google Index Checker script has opened a door to get useful, actionable data at scale for your sites Use it, and act on it.
  • 25. Checking Google Index status at scale with Node.js Thank you. builtvisible.com Jose Luis Hernando Senior Technical SEO Consultant @jlhernando
  • 26. Checking Google Index status at scale with Node.js How does Google crawl the web – Annabelle Bouard & Dimitri Brunel (Botify) English Google Webmaster Central office-hours hangout – Google Webmasters YouTube Channel Google Only Can Judge Site Quality Based On Pages They Index – Barry Swartz (Search Engine Roundtable) Data Secrets of the Index Coverage Report - Blind Five Year Old (AJ Kohn) How Google Search Works – Google Documentation How Search organises information – Google Documentation Our new search index: Caffeine - Carrie Grimes When indexing goes wrong: how Google Search recovered from indexing issues & lessons learned since - Vincent Courson, Google Search Outreach How Search Engines Work: Crawling, Indexing & Ranking – Moz (Please) Stop Using Unsafe Characters in URLs – Jeff Starr Sources & additional reading

Editor's Notes

  1. Technical SEO Consultant at Builtvisible Builtvisible is a Digital Marketing Agency focusing exclusively on Organic Performance. We are specialist in Technical SEO, Content Strategy, Digital PR and Analytics and we deal primarily with medium and large-scale sites targeting both national and global audiences online.
  2. If you’re not in Google’s index you will not appear in Google SERPs To appear in Search Results, Google has to discover, crawl, render and index your website’s pages. Only once you’re in the index, you will be eligible to appear in SERPs and then you can acquire users through organic search. If you don’t know which pages are indexed you don’t know which pages can acquire users organically
  3. Pages that you’ve probably spent lots of time customising to serve users. These pages will be evaluated in the same way as low quality pages that are indexable: Uncontrolled facet nav USG Non-canonicals
  4. If you have an e-com site that has uncontrolled faceted navigation, Gbot will have to download that page (and its resources) to evaluate if that page is valuable. If for example, you have uncontrolled facet navigation, Gbot will have to crawl and render those URLs to see if these pages contain valuable information for future user query. Since this is not controlled, it can go ad-infinitum and hence wasting Google’s resources on URLs that are very likely not as valuable as others that you have in your site architecture.
  5. Key step in the indexing pipeling  Crawling In order for Google to Index your site it needs to crawl your site. But how much of your site is Googlebot crawling? According to a study from Botify using 270 sites with different architecture sizes, certainly not all of it. In this graph there are 2 important concepts: Crawl Ratio & Active ration (explain) If you are dealing with a site that has less than 10k URLs, Google is crawling on avg. 71% of your site and only 45% of that gets organic clicks. If we continue increasing the size of a website we can see that the rate at which Googlebot crawls your site, declines more and more. To the point where, if your site has more than 1M URLs, Googlebot crawls on average only 34% of your site and only 10% of those URLs get clicks from Organic Search
  6. Challenges Even if you are lucky enough to have access to your logs on a regular basis, Googlebot’s crawling behaviour doesn’t determine indexing status - You cannot guarantee that those URLs that have not received clicks from Google Search are actually part of Google’s index
  7. 2) If you don’t have access to server logs you have even less data, and hence you rely information that Google provides you through: a) site: & inurl: operators  Rough estimate for site-wide numbers and a lot of times inaccurate info for individual URLs b) Google Search Console reports  Inspection Tool (Great but you hit quota limit after 200 URLs and hence a bit pointless to automate)  Coverage/Sitemap Coverage reports (Great but GSC only allows 1,000 rows of data per report)
  8. Download Our Google Index Checker script from Github – Developed by our Senior Developer Alvaro Fernandez Download/Update Node.js Script relies on using ScraperAPI to get info from Google Search  Super easy to use and you can Sign up for Free to get the API Key. Concurrent requests limited to 5  ScraperAPI Free Plan Max limit but Al has built a function to automatically adapt concurrent to the Tier Plan limit Unlimited number of URLs Perfect for Clean URLs but it can also process parameterised URLs, case sensitive, international encoded characters, reserved/unreserved symbols Recycling feature Nice overview of the index status check when finishes
  9. Download your XML sitemap/s using your preferred crawler (SF, DC, OC, SB) get your list of URLs and create a urls.csv file and add it to the Google Index Checker Once it’s finished, you will get a CSV file with your results and you can find out how much of your sitemap is indexed. In this example I’ve taken argos.co.uk because is a large Ecom site, with a mix of normal URLs and URLs with unsafe characters.
  10. Download your XML sitemap/s using your preferred crawler (SF, DC, OC, SB) get your list of URLs and create a urls.csv file and add it to the Google Index Checker Once it’s finished, you will get a CSV file with your results and you can find out how much of your sitemap is indexed. In this example I’ve taken argos.co.uk because is a large Ecom site, with a mix of normal URLs and URLs with unsafe characters.
  11. Download your XML sitemap/s using your preferred crawler (SF, DC, OC, SB) get your list of URLs and create a urls.csv file and add it to the Google Index Checker Once it’s finished, you will get a CSV file with your results and you can find out how much of your sitemap is indexed. In this example I’ve taken argos.co.uk because is a large Ecom site, with a mix of normal URLs and URLs with unsafe characters.
  12. We found ~160k Non-canonical category pages with a significant amount of Googlebot request The problem was that the non-canonical URLs contained an Uppercase character which wasn’t supposed to be there. Firstly, we wanted to identify if these pages were indexed Secondly we wanted to know if the non-canonical URLs were being indexed instead of the canonicals In the end we found approximately 36% of the Non-canonical URLs that were indexed instead of their canonicals.