Google Blocks Third-Party HTML Crawlers

January 21, 2025

Over the past week, Google has introduced a new measure, intended to limit the way external crawler agents interact with and use data from its index. We’re going to talk you through the change, step by step, explaining its impact and looking at what it might mean for your own SEO strategy.

So, what has changed exactly?

Google no longer allows HTML agents to crawl content from in its index, leaving Javascript as the most obvious alternative.

What do you mean by ‘crawler’?

In SEO, a crawler (also referred to as spiders or bots) is a system powered by different types of script, used to fetch and display information about different web pages. Google has its own crawler agents (Googlebot), which you’ll be familiar with if you use Google Search Console.

When you queue a page for indexing in GSC, you’re essentially adding it to a queue, where it’ll wait its turn to be ‘crawled’ by Googlebot. From here, Googlebot decides whether or not it’d like to display the page in search results, based on how much value it serves to users. It’ll determine the value of the page by understanding information about the content on the page, understanding the topic and processing data about the page – i.e metadata, headings, and links.

What other kinds of systems use crawlers?

The SEO tools that many in-house marketers and marketing agencies use for search data are also powered by crawlers. They use their own, often HTML-powered, crawlers to process and display data about pages, shown in the form of various metrics depending on the tool you’re using.

Why do many of us opt to crawl using HTML over JS?

HTML systems are favoured over JS for their ability to efficiently process vast quantities of data, and they power everything from SEO tools to AI platforms such as ChatGPT, Perplexity, and Claude (we’re coming back to this part later). Crawling with JS is costlier, and more demanding in terms of resource required – so it seems strange that Google has left us in this situation!

Why do these changes have an impact on SEO tools?

As we’ve explained, many popular SEO tools rely on HTML scraping agents to pull live information about search engine indexes, displaying this data in a variety of ways – from showing page rankings, to individual keyword positions. Google has openly said that third-party tools are the intended target of this change, platforms rely on HTML crawlers. SEMrush was one of the tools hit by the crawling clamp down; conversely, Sistrix and was said to be unaffected.

With this context, it just looks like Google is trying to curb how much SEOs know about its index. But we’re about to throw a curveball.

Google has an AI axe to grind

If we piece together a few key points, it’d look like this:

HTML crawling is extremely accessible and cheap.
JS crawling is expensive and resource-heavy.
Google uses content from its index to train its own AI system, ‘AI Overviews’.
Rival AI platforms (such as ChatGPT, Claude, and Perplexity) rely on crawling Google’s index.
Google has, for several years, enforced high standards for publishers.

Do you see where we’re going with this? With those pointers in mind, it looks a lot like Google is getting ruthless as a means of protecting its *own* data. AI Overviews references existing top ranking content from Google’s index, displaying the result in the form of the newly introduced SERP feature. Google has released stringent content updates in the past decade, rolled out to enhance the quality of results available; and now these results are being used as part of the company’s own effort to contend in the AI landscape. Just a bit of food for thought.

What does this mean for SEO?

SEO has always adapted to meet these kind of shifts – just look at the likes of Youtube SEO, TikTok SEO, Amazon SEO… these niches all depict how SEO can be tweaked and modified to target a specific medium, and it’s already the case for AI SEO. While Google maintains its dominance in the search engine market, a declining quality of results is leading everyday users towards exploring alternatives, contributing to increased adoption rates for AI-powered search engines. Optimising for AI algorithms will become an equally-weighted focus for the brands and businesses that are agile enough to move with modernity, while Google risks slipping closer to becoming a ‘search silo’.

In practical terms, the update might mean costlier subscriptions for many go-to tools, as the main players will have to invest in refining their JS crawling capabilities. At the moment, JS crawling comes with significant inaccuracies, which isn’t helpful for the teams and agencies that rely on data precision to inform their marketing decisions.

Stay up to date with the latest search developments

Whether you work with an agency already, or you‘re a business owner trying to understand what these changes will mean for your own strategy, we’re here to help. Keep an eye out for ‘The SEOdiac’, our monthly newsletter which features a roundup of stories and developments from search.

Back