Chapter 7 — Bot-Friendly Infrastructure
Definition
Bot-friendly infrastructure is the deliberate configuration of robots.txt, CDN bot management, and firewall rules so AI retrieval crawlers (OAI-SearchBot, ChatGPT-User, Claude-SearchBot, PerplexityBot, Googlebot) can read your store, while training crawlers (GPTBot, ClaudeBot, CCBot) are an independent, deliberate choice. Most Shopify stores get this wrong by default — Shopify’s robots.txt and Cloudflare’s bot rules are configured for protection, not visibility. Without a deliberate override, you are likely invisible to one or more AI search engines right now.
Why it matters
This is the silent failure mode.
Every other GEO investment — schema, content, off-site authority — assumes AI crawlers can reach your store. If they can’t, none of it compounds. Per research cited by Mersel and ziptie.dev, approximately 27% of B2B SaaS and ecommerce sites unknowingly block major LLM crawlers at the CDN layer5. Cloudflare’s default “AI Scraper” controls and Shopify’s silent robots.txt updates have produced a category of stores that are doing everything else right and getting zero AI citations because the bots literally can’t read the page.
The asymmetry is the trap. Blocking GPTBot opts you out of OpenAI training. Blocking OAI-SearchBot removes you from ChatGPT search results entirely14. Two different crawlers, two different consequences. Most owners hear “block GPTBot to opt out of AI” and don’t realize OAI-SearchBot is independent — and that’s the bot that decides whether ChatGPT cites your store. Per OpenAI’s own documentation: “Sites that are opted out of OAI-SearchBot will not be shown in ChatGPT search answers”1.
A second confusion point: Google-Extended does not control AI Overviews or AI Mode24. AI Overviews use the standard Googlebot. Blocking Google-Extended only opts out of Gemini training. Owners who block Google-Extended thinking it removes them from AI Mode have done nothing of consequence; the actual control is Googlebot, which they presumably want allowed.
The fix is small. The cost of skipping it is total.
The bot taxonomy
Each vendor now ships split user-agents for training, retrieval, and on-demand fetch. Treat the columns as independent decisions per crawler.
| Vendor | Retrieval (ALLOW for AI visibility) | Training (independent decision) | User-initiated fetch |
|---|---|---|---|
| OpenAI | OAI-SearchBot1 | GPTBot1 | ChatGPT-User1 |
| Anthropic | Claude-SearchBot3 | ClaudeBot (formerly anthropic-ai)3 | Claude-User3 |
| Perplexity | PerplexityBot4 | — | Perplexity-User (may bypass robots.txt per Cloudflare investigation)4 |
| Googlebot (also powers AI Mode + AI Overviews)2 | Google-Extended (Gemini training only — does NOT control AI Overviews)2 | — | |
| Microsoft Copilot | Bingbot | — | — |
| Apple, ByteDance, Common Crawl | — | Applebot-Extended, Bytespider, CCBot | — |
For Shopify stores chasing AI visibility, the minimum allow-list is: OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Googlebot. Training crawlers are a separate decision per vendor — independent of the visibility decision.
Sidebar — llms.txt is not the answer
You will be told by SEO tools and “AI SEO” agencies that you need to add an llms.txt file to your Shopify store. Skip it.
- Google’s John Mueller publicly stated: “FWIW no AI system currently uses llms.txt”6. He compared it directly to the obsolete meta keywords tag.
- Gary Illyes confirmed at Google Search Central Live that Google does not support llms.txt and has no plans to6.
- No major AI provider — Google, OpenAI, Anthropic, Meta, Mistral — has adopted it6.
- SE Ranking analyzed 39,000 domains and found no statistically significant correlation between llms.txt presence and AI citation frequency6.
The work to invest in is robots.txt configured correctly, structured data, and content quality. Adding an llms.txt file is busywork that no AI engine reads. If your SEO tool flags missing llms.txt as an issue, ignore the warning.
The system
| Cadence | Task | Difficulty | Note |
|---|---|---|---|
| Setup | Audit current robots.txt at yourstore.com/robots.txt | 🟢 | Most Shopify stores haven’t checked since launch |
| Setup | Edit robots.txt.liquid to explicitly allow AI retrieval crawlers | 🟡 | Override Shopify defaults — they may silently block AI bots |
| Setup | Audit Cloudflare/CDN bot management for AI crawler blocking | 🔴 | ~27% of ecommerce sites block AI bots at CDN layer unknowingly5 |
| Setup | Audit firewall/WAF rules for AI bot user-agents | 🔴 | App-level firewalls (Wordfence equivalents, security apps) often block by default |
| Setup | Document the allow/block decision per crawler (separate retrieval from training) | 🟢 | Decisions are binary per crawler — never per vendor |
| Validation | Test robots.txt with Google’s robots.txt tester | 🟢 | Catches syntax errors most owners never see |
| Validation | Check server logs for OAI-SearchBot, Claude-SearchBot, PerplexityBot, Googlebot hits | 🟡 | Zero hits = blocked somewhere upstream of the file |
| Validation | curl test from each retrieval crawler’s user-agent string | 🟡 | Confirms full request can complete end-to-end |
| Validation | Cross-reference AI referral traffic in GA4 with robots.txt config | 🟡 | Traffic source should match the allow-list |
| Maintenance | Quarterly review of new AI bot user-agents (list expands) | 🟢 | Q1 2026: Claude-SearchBot and Claude-User were added by Anthropic3 |
| Maintenance | Watch Shopify default robots.txt updates (silent changes) | 🟡 | Override-and-monitor is the only safe pattern |
| Maintenance | Annual CDN bot management policy review | 🟡 | Cloudflare ships new defaults regularly |
Apply for the audit → No card required. Delivered in 48h.
Common gaps (8 out of 10 audits)
- Shopify default robots.txt unmodified. The store has never overridden
robots.txt.liquid. Whatever Shopify ships becomes the policy. Shopify has updated the default file silently more than once since 20245. - CDN-level AI bot blocking enabled by default. Cloudflare’s “AI Scraper” toggle blocks named AI crawlers regardless of robots.txt. ~27% of ecommerce sites have this enabled without realizing it5. The store passes its robots.txt test and still receives zero crawls.
- Blocking GPTBot, leaving OAI-SearchBot allowed (or vice versa). Owner intends “opt out of AI” but configures the wrong bot. Blocking GPTBot alone leaves the store fully crawled by ChatGPT search. Blocking OAI-SearchBot alone removes the store from ChatGPT citations while still feeding training data1.
- Confusing Google-Extended with AI Overviews / AI Mode control. Owner blocks Google-Extended thinking it removes the store from Google’s AI answers. It doesn’t — those use Googlebot2. Block has zero visibility effect; the work was theatrical.
- No server log review, ever. The owner has never verified which AI bots are actually hitting the store. Without log review, configuration changes are deployed blind.
- Outdated bot list missing 2026 additions. Anthropic’s three-bot split (ClaudeBot, Claude-SearchBot, Claude-User) launched in 2026. Stores configured in 2024-2025 reference only
anthropic-ai, missing the new retrieval crawler entirely3.
Paid layer connection
Bot infrastructure determines whether your store is eligible for AI search citation. ChatGPT Ads runs adjacent to the organic citation flow — but a store invisible to OAI-SearchBot is still ineligible to appear as a brand in the surrounding context the ad runs against. The infrastructure work in this chapter is a prerequisite for both organic visibility and paid placement performance.
Deeper dive
Standalone posts will go further on:
- Shopify robots.txt.liquid template — full override file with the correct 2026 allow-list
Subscribe → — 4x weekly. Deep-dives ship here.
- OpenAI. Bots documentation. platform.openai.com/docs/bots. Documents the OAI-SearchBot, GPTBot, and ChatGPT-User user-agents and their independent robots.txt behavior. Full reference →↩
- Google Search Central. AI features and your site. developers.google.com/search/docs/appearance/ai-features. Confirms Google-Extended controls Gemini training only, while Googlebot powers AI Overviews and AI Mode. Full reference →↩
- Anthropic. ClaudeBot, Claude-SearchBot, and Claude-User crawler documentation (2026). Details the three-bot split mirroring OpenAI’s structure. Full reference →↩
- Perplexity. PerplexityBot and Perplexity-User documentation. docs.perplexity.ai/. Full reference →↩
- Mersel / ziptie.dev research (2026). Approximately 27% of B2B SaaS and ecommerce sites unknowingly block major LLM crawlers at CDN layer. Full reference →↩
- Mueller on Bluesky (June 17, 2025, @johnmu.com): “FWIW no AI system currently uses llms.txt.” Reported via Search Engine Roundtable. Mueller separately compared llms.txt to the keywords meta tag in a Google Search Central Help community discussion (April 2025), reported via Search Engine Journal. Illyes at Google Search Central Live (July 2025, in-person event without video coverage): “Google doesn’t support LLMs.txt and isn’t planning to” — reported via Kenichi Suzuki and aggregated by The SEO Community. Empirical: SE Ranking analysis of 39,000 domains found no statistically significant correlation between llms.txt presence and AI citation frequency. Full reference →↩