Gah, the writing on this is so painful to read, it feels like this was most likely written by an LLM.
The writing style is so unclear, it's hard to figure out one of the key points: it mentions that Gemini doesn't use a distinct user-agent for its grounding. It doesn't mention whether it actually hit the endpoint during the test, though it kind of implies that with "Silence from Google is not evidence of no fetch." Uh, if there are no requests coming in live, that means no fetch, it's using a cache of your site.
It makes a difference whether it fetches a page live, or whether it's using a cached copy from a previous crawl; that tells you something about how up-to-date answers are going to be from people asking questions about your website from Gemini. But I guess the LLM writing this article just wanted to make things sound punchy an impressive, not actually communicate useful information.
Anyhow, LLM marketing spam from an LLM marketing spam company. Bleh.
I did use AI to organize my ideas but I didn't think it was that bad, I'll modify and make it easier to read.
Anyway, in my test I saw zero requests from any Google UA after multiple Gemini and AI mode prompts that should have triggered grounding, so the working interpretation is that Gemini served from its own index/cache rather than doing a live provider-side fetch. The original phrasing was fuzzier than it should have been.
So the state of AI in 2026: ChatGPT DDoS-lite, Claude the polite one that actually reads the rules, Perplexity maybe shows up, and Google was already in your house.
Added $http_accept and re-ran. None of them use text/markdown. Results:
ChatGPT-User/1.0 text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9
Claude-User/1.0 /
Perplexity-User/1.0 (empty, no Accept header)
PerplexityBot/1.0 (empty, no Accept header)
ChatGPT sends a Chrome-style Accept string. Claude sends a wildcard. Perplexity sends nothing at all. Gemini didn't fetch in my test.
Also worth noting: Claude-User hit /robots.txt before the page.
I wish debates about “ai scraping my site” had more nuance.
There are multiple ways these tools access your site and only one of them is “using it for training”. Others are webfetch from chat sessions, “deep research” agents, etc. And those will have different traffic patterns. They aren’t crawlers, they are clumsy, ham handed AI agents doing their humans bidding.
Both can give a site the hug of death. Both can be badly coded. But there is much different intent behind the two and I feel it is important to acknowledge the difference.
The writing style is so unclear, it's hard to figure out one of the key points: it mentions that Gemini doesn't use a distinct user-agent for its grounding. It doesn't mention whether it actually hit the endpoint during the test, though it kind of implies that with "Silence from Google is not evidence of no fetch." Uh, if there are no requests coming in live, that means no fetch, it's using a cache of your site.
It makes a difference whether it fetches a page live, or whether it's using a cached copy from a previous crawl; that tells you something about how up-to-date answers are going to be from people asking questions about your website from Gemini. But I guess the LLM writing this article just wanted to make things sound punchy an impressive, not actually communicate useful information.
Anyhow, LLM marketing spam from an LLM marketing spam company. Bleh.
Anyway, in my test I saw zero requests from any Google UA after multiple Gemini and AI mode prompts that should have triggered grounding, so the working interpretation is that Gemini served from its own index/cache rather than doing a live provider-side fetch. The original phrasing was fuzzier than it should have been.
Don't worry.
The IPs listed in the output are from reserved ranges as well, like they were intentionally obfuscated (but this was not shared with the reader).
It’s the kind of obfuscation that AI would do (using esoteric bogon ranges as well)
https://ipinfo.io/ips/203.0.113.0/24
ChatGPT-User/1.0 text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9 Claude-User/1.0 / Perplexity-User/1.0 (empty, no Accept header) PerplexityBot/1.0 (empty, no Accept header) ChatGPT sends a Chrome-style Accept string. Claude sends a wildcard. Perplexity sends nothing at all. Gemini didn't fetch in my test.
Also worth noting: Claude-User hit /robots.txt before the page.
The content is interesting, but it's delivered in an article that smells like slop.
There are multiple ways these tools access your site and only one of them is “using it for training”. Others are webfetch from chat sessions, “deep research” agents, etc. And those will have different traffic patterns. They aren’t crawlers, they are clumsy, ham handed AI agents doing their humans bidding.
Both can give a site the hug of death. Both can be badly coded. But there is much different intent behind the two and I feel it is important to acknowledge the difference.