Through our research, we discovered that Google doesn’t index every page on the web. For instance:
- Only 35% of Walmart’s product pages are indexed in Google.
- Only 50% of Barnes & Noble’s product pages are indexed.
The rest of the products simply aren’t accessible to Google users! Meaning, these businesses can lose lots of money.
The chances are your business may be affected too!
This is the deck Tomek Rudzki presented at SEODay 2020 in Denmark on January 29, 2020.
Let me ask you a question.
Who is the king?
Links?
Or content?
The answer is nobody.
Nobody is the king if your content cannot be found in Google.
We have been doing research, and it turned out that, by average, 15% of product pages are not indexed in Google!
Now, Imagine that, statistically, Google users cannot find one out of eight products. That’s a huge issue.
What does it mean a page is not in Google? I will explain it just in simple terms.
Google users cannot find a page in Google =
A business gets less money
Competitors take over. I’ve heard the competition is Denmark is stiff :)
Lt me show you some examples of big brands that struggle with indexing their content. That will simply blow your mind.
VerizonWireless - 45% of their product pages cannot be found in Google
Walmart - 45%
BarnesAndNoble - 57%
YOUR website = ?
I am sure you are curious about some examples from Denmark, aren’t you?
Here you go!
DR.dk - 86%Pricerunner.dk - 70%DBA.dk - 18%. Just one of five of their pages can be found in Google, that's totally weird! It seems not only American websites has some issues, but also Danish.
It will tell you something in secret :) I’m from Poland, but there are many Polish websites that have such issues.
I think I’ve found the reason for this.
The reason is that big brands don’t care about SEO!
I performed a quick Twitter poll.
It seems we all know at least one big company that doesn’t even know SEO exist :)
Over 65% of people know at least one big company that doesn't work with SEOs.
Let me begin with a Yoox case:
This is a very interesting one. Let us check.
We checked SEMRush statistics and for apparel brands, by average 40% of traffic comes from Google organic.
But Yoox just get just 13%!
I started wondering why is that.
We checked it and it turned out that in a case of Yoox - just 25% of their product pages can be found in Google (!)
That is worrisome They cannot be found in Google!
Yoox is losing lots of money because of that.
So what is wrong with Yoox?We checked this.
Google cannot properly crawl their paginated category pages.
But there’s more, too. Google commonly skips links to related products on product pages. So, to summarize this, Yoox has gigantic issues with internal linking.
Now let me tell you the story of Medium.
Medium was founded by Evan Williams, the ex CEO of Twitter. Before Medium, he also founded Blogger. He believed people should have more than just a couple of characters to express themselves. Medium is now one of the most popular content platforms, used by many of the biggest companies out there.However, recently, they had a huge 50% SEO visibility drop.
So Medium is full of great content. Some writers make a living publishing their stories there.
What are potential issues with Medium?
Firstly, Medium cloaks Googlebot.Users are shown a prompt to subscribe. Zero links to articles / categories.A version for Googlebot is totally different. It's full of links.
There is another issue, thin content.
I am not speaking about the quality of their articles. Most of them are great, commonly written by journalists, CEOs and thoughtfull leaders.
The tricky thing is that even if they accepted only top-notch content, the would struggle from thin content. How it's possible? Let's see!
They index a tremendous amount of thin content.
User profiles, list of users they follow, list of their followers, list of their highlights.
So in total, around 7.5 million URLs should not be indexed (and counting), but are.
Did I mention each comment is indexable and has a different URL?
That's a huge problem.Most of you use Apple devices, right?
You use that because it's reliable, productive, right?
But now imagine Apple changed their strategy.
Imagine that Apple floods the market with tons of low-quality product. All the products that didn't even pass the quality checking. Some with broken display, some with no battery, some with no Operating System.Sounds strange, isn't it? But that's exactly what happened to Medium.
They flood the market (Google) with tons of low quality URLs.
Why is it bad?
It’s a well known fact that Google is judging the quality on three different levels:
- Landing page
- Section (directory)
- The whole website
As I showed you, they index a tremendous amount of thin content.
That's simply not what Google likes ;)
Another issue:
We found that 16 percent of Medium articles aren’t indexed in Google.
When you publish an article on Medium, you get 16% of risk that your article won’t be found by Google users.
I made some math.
When you publish 4 articles, you get 50 percent of chance Google will index all the articles.
That's a coin flip!
Do you remember my question at the very beginning, asking you who is the king?
Medium case shows as that even a perfect combination of a great content and tons of external links from other sources is not enough. Even the best content is nothing is Google isn’t able to discover it and rank it high.
Basically, It’s not only Medium problem.
It’s a huge problem of many big brands
Do you know giphy?
One of the most popular online websites with memes?
Giphy lost 90% of their SEO Visibility. Why is that?
There were multiple, technical SEO reasons for that.
I explained it in one of my articles.
There is another extremely interesting case – Disqus.
Can you guys believe that for months they have been serving Googlebot a blank page?
We informed them about the issue. And it took them over 3 month to fix the issue.
Let's go back to the main topic. As I said, there is a huge issue Google doesn't index every product page of many big brands.
I bet you're curious on why it's happening.
There are 3 main reasons:
- Google cannot discover it (no internal links or low crawl budget) Spoiler alert: sitemap is not enough. Having valuable URLs in a sitemap is crucial, but it doesn’t guarantee your content will be indexed.
- Google decided not to index it because it's a duplicate or thin content.
- Indexing delay - that’s a huge issue for newspapers, classifieds, etc.
We will be talking more about that in the future.
Commonly, there are tons of pages within websites that aren’t accessible for Googlebot.
You want to check if Google can access all valuable URLs on your website?
What can you do to accomplish that?
Commonly, sitemaps contains links to all the valuable resources on a website.
However, it frequently happens that many of URLs aren't accessible for Googlebot through crawl.
What I recommend is checking which URLs are not available for an SEO crawler.
Most popular tools, as ScreamingFrog and Sitebulb, offer such a feature out of the box.
Most common issue here: infinite scrolling.
Commonly, Google can see just 20 products per category.
That could make Googlebot a really difficult job to discover all the valuable product pages.
During my research, I spotted multiple websites that don’t allow Googlebot see the second page of pagination.
Here are some examples: NewEgg, Nike, H&M, Walgreens, AT&T, Udemy
Another reason on why Google is not indexing every content is Duplicate or thin content.
It may be caused by JavaScript SEO issues.
If Google struggles with rendering your content, it may think that there is no content at all.
Another obvious possibility is that your content is really thin. In such a case, investigate your content :)
Duplicates
This topic is out of the scope of this presentation, but if each of the products is available under 4 different addresses, you’ve got a problem.
You should fix that. Read Google’s article on Duplicate content. https://support.google.com/webmasters/answer/66359?hl=en
Now it's time to discuss about indexing delay.
It may happen that Google indexes your content within weeks, or even months (!).
You have a newspaper? Classified?
You should make sure Google can index it quickly!
It takes time.
Google has over 170 trillion pages to visit.
The web is so huge.
In the case of The Guardian, a whooping part of their content is indexed withing 1 day.
For instance, a little bit over 58% of Eventbrite pages are not indexed within 2 weeks.
The outcome is clear. If you plan to create a new event, make sure you publish it at least one month before :)
You can ask why it takes so long to index a page in Google?
There is a very probable reason that Google is visiting too many low quality pages on your website.
I know by this I will lose most of the audience.
But the tool is so powerful.
Using logfile analysis, you can see exactly which pages of your website are visited by Google and which don't.
Time consuming, but it can tell you exactly:
- What sections of your website are visited by Google and which don’t.
I.e Google is frequently visiting blog pages but don’t visit your product pages, you’ve got a problem.
Using logfile analysis you can check if Google spends too much time on low quality pages.
The solution for this issues is Crawl budget optimization.
If it's first time you heard this term, think of "Googlebot journey optimization" - the term I invented for the conference :)
There is another challenge for 2020.
It's JavaScript SEO.
JavaScript is extremely popular, but commonly, websites have JavaScript SEO issues.
If you are curious about JavaScript technology, commonly:
-pagination, internal linking, reviews, comnents are generated by JavaScript.
Also, sometimes main content is generated by JavaScript.
The problem is when Google struggles with rendering & indexing your JavasScript content.
In such a case Google ma not see your content, comments, navigation, links to related products.
Based on my research, 80% of the most popular ecommerce stores in the USE use JavaScript to generate crucial content. (By crucial content I mean “the main content” or links to similar products.)
That’s the trend.
What if Google cannot see link to related products?
Many products cannot be discovered in Google -> no money.
Many products don’t get to high ranking -> no money.
What does it mean for you?
Google doesn’t index crucial content (main content) - a page may not rank high
How to check what content is generated by JavaScript?
Simply go to https://www.onely.com/tools/wwjd/ and type the URL of your website into the console.
Then, look at the screenshots that the tool generates and compare the two versions of your page - the one with JavaScript enabled & disabled.
Use Google Search Console frequently. Having this tool, you will see exactly why a page is not indexed.
Also, check Google Search Console's Coverage reports to see why certain groups of pages are not indexed.