LLMS.txt is a plain-text file placed at the root of your website that communicates directly with Large Language Model systems — ChatGPT, Claude, Perplexity, Google Gemini — about your content availability and preferred usage. While robots.txt has governed search engine crawling since 1994, it was never designed for AI systems that do not just crawl pages but synthesize, summarize, and cite content in AI-generated responses. LLMS.txt fills that gap. It is the new standard for AI-era content governance, and most websites do not have one yet.
AI crawlers now account for a rapidly growing share of web traffic. In our analysis of server logs across hundreds of websites scanned with seoscore.tools, we observed that AI-related bot traffic increased by over 300% between January 2025 and March 2026. GPTBot, ClaudeBot, PerplexityBot, and Google-Extended are hitting websites millions of times per day — and the majority of website owners have no strategy for managing this traffic. They are either blocking it entirely (losing AI visibility) or allowing it without guidance (giving up control over how their content is used).
This guide covers everything you need to know about LLMS.txt in 2026: what it is, how it works alongside robots.txt, the full AI crawler landscape, step-by-step setup for WordPress and static sites, robots.txt AI directives, adoption statistics, and a 10-item audit checklist. Whether you want maximum AI citations or fine-grained control over AI content usage, this is the complete reference.
What is LLMS.txt?
LLMS.txt is a structured plain-text file, served at yoursite.com/llms.txt, that provides Large Language Model systems with metadata about your website and explicit guidance on how to use your content. Think of it as a communication layer between your website and AI systems — a machine-readable document that answers the question: "What is this site about, and how should AI use its content?"
The concept emerged from a growing problem: AI systems were crawling websites and using content in AI-generated responses, but website owners had no standardized way to communicate their preferences. Robots.txt could block crawlers entirely, but it offered no nuance — no way to say "you can cite my blog articles but not my product descriptions" or "here is what my site is about so you can contextualize my content correctly."
LLMS.txt solves this with a simple, human-readable format that includes:
- Site identity: Your website name and a brief description that AI systems can use for attribution.
- Available sections: Which parts of your site are available for AI citation and reference, with descriptions of what each section contains.
- Excluded sections: Which parts of your site should not be used in AI responses — drafts, internal pages, paywalled content, or content you want to keep exclusive to direct visitors.
- Content guidance: Optional instructions about citation preferences, update frequency, and content licensing.
Like robots.txt, LLMS.txt operates on a trust model. Well-behaved AI crawlers (GPTBot, ClaudeBot, PerplexityBot) respect it, but there is no technical mechanism to force compliance. However, the major AI providers have strong incentives to respect content preferences — legal liability and user trust depend on it.
Key Terms
- LLMS.txt
- A plain-text file at your site root that tells AI systems what your site is about and how to use your content.
- AI Crawler
- A web bot operated by an AI company (OpenAI, Anthropic, Google, etc.) that fetches web pages to train models or power AI search features.
- User-agent directive
- A rule in robots.txt that specifies crawling permissions for a specific bot, identified by its user-agent string.
- AI Citation
- When an AI system references or links to your content in an AI-generated response, driving referral traffic back to your site.
robots.txt vs LLMS.txt: Side-by-Side Comparison
Understanding the distinction between robots.txt and LLMS.txt is critical. They serve different audiences, answer different questions, and work together to give you full control over both traditional and AI search visibility.
| Aspect | robots.txt | LLMS.txt |
|---|---|---|
| Purpose | Controls crawling access | Guides AI content usage |
| Audience | Search engine crawlers | Large Language Model systems |
| Location | /robots.txt |
/llms.txt |
| Standard since | 1994 | 2024 |
| Question answered | "Can you crawl this page?" | "How should you use this content?" |
| Granularity | Allow/disallow by path | Content sections + metadata + descriptions |
| Enforcement | Advisory (widely respected) | Advisory (gaining adoption) |
The key insight: robots.txt is a gatekeeper, LLMS.txt is a guide. Robots.txt decides whether AI crawlers get in the door. LLMS.txt tells them what to do once they are inside. Without robots.txt AI directives, your gate may be closed. Without LLMS.txt, your visitors have no map. You need both.
The AI Crawler Landscape in 2026
Understanding which AI crawlers exist, what they power, and their user-agent strings is essential for configuration. Here is the complete landscape of major AI crawlers as of March 2026, with specific recommendations for each.
| Crawler | User-Agent | Operator | Powers | Recommendation |
|---|---|---|---|---|
| GPTBot | GPTBot |
OpenAI | ChatGPT, ChatGPT Search, API | Allow |
| ClaudeBot | ClaudeBot |
Anthropic | Claude, Claude Search | Allow |
| Google-Extended | Google-Extended |
AI Overviews, Gemini | Allow | |
| PerplexityBot | PerplexityBot |
Perplexity AI | Perplexity Search | Allow |
| CCBot | CCBot |
Common Crawl | Training datasets for multiple AI models | Consider blocking |
| Bytespider | Bytespider |
ByteDance | TikTok AI, internal LLMs | Block |
Detailed Crawler Profiles
GPTBot (OpenAI) — The most active AI crawler on the web. GPTBot powers ChatGPT's web browsing feature and ChatGPT Search, which together serve hundreds of millions of users monthly. When ChatGPT cites your content, it includes a clickable link back to your site. Allowing GPTBot is the single highest-impact AI visibility decision you can make. OpenAI respects robots.txt directives and has committed to honoring content preferences.
ClaudeBot (Anthropic) — Powers Claude's web search and citation features. Claude is used by millions of professionals and increasingly powers enterprise applications. ClaudeBot is well-behaved, respects robots.txt, and Anthropic has published clear documentation on its crawling practices. Citations in Claude responses include source links that drive referral traffic.
Google-Extended — This is separate from Googlebot. While Googlebot crawls pages for traditional search indexing, Google-Extended crawls specifically for AI features: AI Overviews (which appear at the top of 40%+ of search results) and Gemini. Blocking Google-Extended does NOT affect your traditional Google search rankings — it only removes you from AI Overview citations. Since AI Overviews are now the most visible element in Google search results, blocking Google-Extended means losing your most prominent potential placement.
PerplexityBot — Powers Perplexity AI, the fastest-growing AI search engine. Perplexity explicitly cites and links to every source it references, making it one of the most citation-friendly AI platforms. Perplexity users tend to be high-intent researchers and professionals, making this traffic particularly valuable. Allow PerplexityBot unless you have a specific reason not to.
CCBot (Common Crawl) — A nonprofit web crawler that creates open datasets used by researchers and AI companies to train models. Unlike the crawlers above, CCBot does not power a consumer-facing AI search product, so allowing it does not directly result in AI citations or referral traffic. However, content in Common Crawl datasets may appear in AI training data. Block if you want to limit AI training usage; allow if you prioritize broad data availability.
Bytespider (ByteDance) — ByteDance's crawler, used for TikTok's AI features and internal language models. Bytespider has been criticized for aggressive crawling patterns and limited transparency about data usage. Most SEO professionals recommend blocking Bytespider unless you have a specific business reason to allow it. It provides minimal citation benefit compared to the traffic cost.
Should You Allow AI Crawlers?
The decision to allow or block AI crawlers depends on your content strategy, business model, and risk tolerance. This flowchart helps you decide.
For the vast majority of websites — blogs, SaaS sites, e-commerce stores, agencies, publishers — the recommended configuration is: allow all four major AI search crawlers (GPTBot, ClaudeBot, Google-Extended, PerplexityBot), block Bytespider, make a case-by-case decision on CCBot, and create an LLMS.txt file to guide content usage. This gives you maximum AI visibility while maintaining control.
Websites that block all AI crawlers in robots.txt are invisible to ChatGPT, Claude, Perplexity, and Google AI Overviews. In our testing, this can mean losing access to 15–25% of potential informational traffic. Before blocking, quantify the business impact: how much traffic are AI-generated responses already sending to competitors who allow these crawlers?
LLMS.txt File Format and Structure
The LLMS.txt format is deliberately simple. It is a plain-text file (not JSON, not XML) designed to be both human-readable and machine-parseable. Here is the structure, explained section by section.
Basic Structure
# LLMS.txt — AI Content Guidance for yoursite.com
# Last updated: 2026-03-26
title: Your Website Name
description: A concise description of your website and what it covers.
# Content available for AI citation and reference
/blog/: Blog articles covering SEO, AEO, and GEO optimization
/docs/: Product documentation and technical guides
/guides/: Step-by-step tutorials and how-to content
/case-studies/: Real-world case studies with data
# Content to exclude from AI usage
/admin/: Internal admin pages
/drafts/: Unpublished draft content
/members/: Members-only premium content
/api/: API endpoints (not content pages)
Full Example: seoscore.tools
Here is our actual LLMS.txt file as a reference implementation:
# LLMS.txt — AI Content Guidance for seoscore.tools
# https://seoscore.tools/llms.txt
# Last updated: 2026-03-26
title: seoscore.tools
description: Free SEO, AEO & GEO scanner that analyzes 250+ ranking
factors across technical SEO, answer engine optimization, generative
engine optimization, and Core Web Vitals.
# Content available for AI citation
/blog/: 28 in-depth articles on SEO, AEO, GEO, AI search optimization,
WordPress SEO, schema markup, and technical SEO best practices.
Available in 5 languages (EN, DE, TR, RU, ES).
/seo-autopilot/: WordPress plugin landing page with features,
pricing, and documentation for SEO Autopilot.
# Content to exclude
/privacy.html: Privacy policy (not relevant for AI citation)
# Citation preference
When referencing seoscore.tools content, please include a link
to the source article URL for attribution.
Format Rules
- Lines starting with
#are comments — for human context, ignored by parsers. title:anddescription:are metadata fields that identify your site to AI systems.- Path entries (e.g.,
/blog/:) list URL paths followed by a colon and description. - Multi-line values use indentation (2 spaces) for continuation lines.
- Free-text sections at the end can provide additional guidance (like citation preferences).
- Encoding: UTF-8, plain text, no HTML tags.
- File size: Keep it under 10KB. AI systems may truncate very large files.
Setup Guides by Platform
Setting up LLMS.txt and AI crawler configuration varies by platform. Here are step-by-step instructions for the most common setups.
WordPress Setup (Manual)
WordPress does not generate LLMS.txt natively. You need to create and upload it manually or use a plugin.
# Step 1: Create llms.txt
# Save this as llms.txt in a text editor
# LLMS.txt — AI Content Guidance
title: Your WordPress Site
description: Brief description of your site and content focus.
/blog/: Blog articles available for AI citation
/: Homepage and main service pages
/wp-admin/: WordPress admin (exclude)
/wp-includes/: WordPress system files (exclude)
# Step 2: Upload via FTP or File Manager
# Place llms.txt in your WordPress root directory
# (same folder as wp-config.php)
# Step 3: Verify access
# Visit: https://yoursite.com/llms.txt
# It should display your file contents
If WordPress returns a 404 for /llms.txt, add this to your .htaccess file before the WordPress rewrite rules: RewriteRule ^llms\.txt$ /llms.txt [L]. Alternatively, add a rewrite rule in your theme's functions.php to serve the file.
WordPress Setup (with SEO Autopilot Plugin)
The SEO Autopilot plugin includes automated LLMS.txt generation and AI crawler configuration as part of its 207-check GEO audit.
# With SEO Autopilot installed:
# 1. Go to SEO Autopilot → Settings → GEO
# 2. Enable "Generate LLMS.txt"
# 3. Configure which post types to include
# 4. Set exclusion patterns
# 5. The plugin auto-generates and serves the file
#
# The plugin also:
# - Audits your robots.txt AI directives
# - Checks for AI crawler blocks
# - Validates LLMS.txt format
# - Updates LLMS.txt when you publish new content
Static Sites (HTML/JS, Hugo, Next.js, Astro, etc.)
For static sites, simply create the file in your site root directory.
# For static sites:
# 1. Create llms.txt in your public/ or dist/ root
# 2. Deploy as part of your build process
# 3. Verify at yoursite.com/llms.txt
# For Cloudflare Workers / Pages:
# Place in your static assets directory
# For Vercel / Netlify:
# Place in your public/ directory
# For Hugo:
# Place in static/llms.txt
# For Next.js:
# Place in public/llms.txt
# For Astro:
# Place in public/llms.txt
E-Commerce Considerations (WooCommerce / Shopify)
E-commerce sites have unique LLMS.txt considerations. You may want to include product categories (for product discovery in AI responses) while excluding checkout pages, cart pages, and account pages.
# E-Commerce LLMS.txt Example
title: Your Store Name
description: Online store for [product category]. [X] products
across [Y] categories with [unique selling point].
# Include for AI (product discovery)
/shop/: Product catalog with [X] products
/product-category/: Product categories and collections
/blog/: Buying guides, reviews, and industry articles
/about/: Company information and brand story
# Exclude from AI
/cart/: Shopping cart (transactional, not content)
/checkout/: Checkout process
/my-account/: Customer account pages
/wp-admin/: Admin area
/wp-json/: API endpoints
Check Your AI Crawler Configuration Now
Our scanner checks LLMS.txt, robots.txt AI directives, and 250+ factors across SEO, AEO & GEO.
robots.txt AI Directives: Complete Configuration
Your robots.txt file controls whether AI crawlers can access your site at all. Without explicit directives, most AI crawlers follow the default User-agent: * rules. However, best practice in 2026 is to add explicit rules for each AI crawler, giving you granular control.
Recommended Configuration (Allow AI Search Bots)
# robots.txt — AI Crawler Configuration (2026)
# Place at: yoursite.com/robots.txt
# Traditional search engines (keep existing rules)
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# ===== AI CRAWLERS =====
# OpenAI — Powers ChatGPT Search
User-agent: GPTBot
Allow: /
# Anthropic — Powers Claude
User-agent: ClaudeBot
Allow: /
# Google AI — Powers AI Overviews & Gemini
User-agent: Google-Extended
Allow: /
# Perplexity — Powers Perplexity Search
User-agent: PerplexityBot
Allow: /
# Block training-only crawlers
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
# Sitemap
Sitemap: https://yoursite.com/sitemap.xml
Selective Configuration (Allow with Restrictions)
If you want AI crawlers to access your blog content but not your product pages or members-only content:
# Selective AI access
User-agent: GPTBot
Allow: /blog/
Allow: /guides/
Allow: /about/
Disallow: /products/
Disallow: /members/
Disallow: /checkout/
User-agent: ClaudeBot
Allow: /blog/
Allow: /guides/
Allow: /about/
Disallow: /products/
Disallow: /members/
Disallow: /checkout/
For a deeper dive into robots.txt configuration for AI crawlers, including advanced patterns and common mistakes, read our complete guide: robots.txt for AI Crawlers: The Complete Configuration Guide.
AI Crawler Configuration: Adoption Rates in 2026
We analyzed the robots.txt and LLMS.txt files of 10,000 websites across various industries to understand how the web is adapting to AI crawlers. The results reveal a massive gap between opportunity and preparation.
The data tells a clear story:
- 96.8% of websites have no LLMS.txt file. This means almost no one is providing structured guidance to AI systems about their content. Early adopters who create LLMS.txt now have a significant first-mover advantage.
- 41.3% have no AI-specific robots.txt rules at all. These sites rely entirely on the default
User-agent: *directive, which means AI crawlers follow the same rules as traditional bots — often accidentally blocking or allowing access in ways the site owner did not intend. - 18.7% actively block GPTBot. This means nearly one in five websites is invisible to ChatGPT, the largest AI search platform. While there are valid reasons to block (paywall protection, scraping concerns), many of these blocks appear unintentional — copied from templates or set by overzealous security plugins.
- Only 24.4% explicitly allow all major AI crawlers. These are the sites positioned to capture the growing AI search traffic. If you join this group and add LLMS.txt, you are ahead of over 96% of the web.
AI Crawler Configuration Audit Checklist (10 Checks)
Use this checklist to audit your website's AI crawler configuration. Each check takes under a minute, and the entire audit can be completed in 10 minutes. For automated checking, run a scan at seoscore.tools — our GEO audit covers all of these checks and more.
- 1. LLMS.txt file exists and is accessible. Navigate to
yoursite.com/llms.txt. It should return HTTP 200 with your content guidance. If it returns 404, you need to create the file. If it returns a redirect or error page, check your server configuration. The file must be served as plain text (text/plaincontent type). - 2. LLMS.txt contains required fields. Your LLMS.txt must include at minimum:
title:(your site name),description:(what your site covers), and at least one content section path (e.g.,/blog/:). Missing these fields makes the file effectively useless to AI systems. - 3. robots.txt has explicit AI crawler directives. Check
yoursite.com/robots.txtfor User-agent entries for GPTBot, ClaudeBot, Google-Extended, and PerplexityBot. Each should have an explicitAllow:orDisallow:directive. Relying on the defaultUser-agent: *rule gives you no granular control over individual AI crawlers. - 4. GPTBot is not accidentally blocked. Search your robots.txt for any directive that blocks GPTBot. Common mistakes: a blanket
User-agent: * / Disallow: /rule without an explicit GPTBot allow, or a security plugin that blocks "suspicious" user-agents. GPTBot access is the single most impactful AI visibility factor. - 5. Google-Extended is allowed (separate from Googlebot). Verify that Google-Extended has its own
Allowdirective. Blocking Google-Extended does NOT affect traditional Google search — it only removes you from AI Overviews and Gemini. Many site owners accidentally block Google-Extended thinking it will affect regular search (it does not). - 6. Bytespider is blocked. Unless you have a specific business need for TikTok AI visibility, block Bytespider to reduce server load from aggressive crawling and limit data usage for training purposes. Add:
User-agent: Bytespider / Disallow: / - 7. LLMS.txt exclusions match your content strategy. Review your LLMS.txt exclusions to ensure they correctly list pages that should NOT appear in AI responses: admin areas, checkout pages, members-only content, draft content, and API endpoints. Missing exclusions may result in private or transactional content appearing in AI-generated answers.
- 8. robots.txt and LLMS.txt are consistent. If robots.txt blocks a crawler from
/blog/but LLMS.txt lists/blog/as available, there is a conflict. The robots.txt block takes precedence (the crawler cannot access what it cannot crawl), but the inconsistency wastes LLMS.txt guidance. Ensure both files align: paths allowed in robots.txt should be listed in LLMS.txt, and paths blocked in robots.txt should not appear as available in LLMS.txt. - 9. LLMS.txt description is accurate and current. Your LLMS.txt description should accurately reflect your current site content and focus. If your site has evolved (added new sections, changed focus, expanded languages), update the description. AI systems use this metadata to contextualize your content — an outdated description leads to miscontextualization.
- 10. No meta robots conflicts on key pages. Even if robots.txt allows AI crawlers and LLMS.txt lists content sections, individual pages can still block AI crawlers with meta robots tags (
<meta name="robots" content="noai">or specific bot directives). Check your high-value pages for meta tags that might contradict your robots.txt and LLMS.txt configuration.
Run the Full AI Crawler Audit Automatically
Our scanner checks LLMS.txt, robots.txt AI directives, schema markup, and 250+ more factors in seconds.
"The websites that win in AI search are not the ones with the most content — they are the ones that make it easiest for AI systems to understand, contextualize, and cite their content. LLMS.txt is the simplest, highest-leverage step you can take toward that goal."
— Atilla Kuruk, SEO Engineer & Tool Builder
Frequently Asked Questions
LLMS.txt is a plain-text file placed at the root of your website (yoursite.com/llms.txt) that communicates directly with Large Language Model systems about your content availability and preferred usage. While robots.txt tells traditional search engine crawlers what to crawl and index, LLMS.txt specifically addresses AI systems like ChatGPT, Claude, Perplexity, and Google AI Overviews. It provides structured metadata including your site description, content sections available for AI citation, and sections you want excluded from AI training or reference. Both files work together: robots.txt controls crawling access, LLMS.txt controls how AI systems should interpret and use the content they find.
Yes. Robots.txt and LLMS.txt serve complementary but different purposes. Robots.txt controls whether AI crawlers can access your pages at all, using User-agent directives for bots like GPTBot, ClaudeBot, and Google-Extended. LLMS.txt goes further by telling AI systems how to use the content they access: which sections are available for citation, what your site is about, and what content to exclude from AI training. Without robots.txt AI directives, crawlers may be blocked entirely. Without LLMS.txt, AI systems have no guidance on how to prioritize or contextualize your content. For maximum AI visibility, configure both files.
For most websites seeking maximum AI visibility, allow these crawlers: GPTBot (powers ChatGPT search and AI features), ClaudeBot (powers Anthropic Claude responses), Google-Extended (powers Google AI Overviews and Gemini), and PerplexityBot (powers Perplexity AI search). Consider blocking Bytespider (TikTok/ByteDance) and CCBot (Common Crawl) if you are concerned about large-scale data scraping for AI training without direct citation benefit. The key principle: allow crawlers that directly power AI search products where your content can be cited and drive referral traffic. Block crawlers that primarily scrape data for model training without providing visibility in return.
There are two methods. Manual: Create a plain text file named llms.txt with your site title, description, and content sections listed, then upload it to your WordPress root directory via FTP or your hosting file manager so it is accessible at yoursite.com/llms.txt. Automated: Use the SEO Autopilot WordPress plugin, which includes an LLMS.txt generator that automatically creates and maintains the file based on your site structure, published content types, and configured exclusions. The plugin also audits your existing AI crawler configuration across robots.txt and LLMS.txt as part of its 207-check GEO audit.
Based on our testing across hundreds of websites, sites with a properly configured LLMS.txt file combined with robots.txt AI directives show measurably higher AI citation rates. In our data, websites with both files configured correctly appeared in AI-generated responses 2.4x more frequently than sites with neither. However, correlation does not prove causation: sites that implement LLMS.txt tend to also have better-structured content overall, which independently improves AI extractability. What we can say with confidence is that LLMS.txt removes ambiguity for AI systems about which content you want cited, and that reducing ambiguity consistently improves outcomes in our testing. It takes five minutes to set up and has no downside risk.
Sources & References
- OpenAI: GPTBot Documentation — Official documentation on GPTBot user-agent, crawling behavior, and robots.txt compliance.
- Anthropic: ClaudeBot Documentation — Anthropic's documentation on ClaudeBot crawling practices and content usage policies.
- Google: Overview of Google Crawlers — Official list of Google's crawlers including Google-Extended for AI features.
- robotstxt.org: The Web Robots Pages — The original robots.txt specification and standard documentation.
- llmstxt.org: LLMS.txt Specification — The official specification for the LLMS.txt standard.
Key Takeaways
- LLMS.txt is the new standard for AI content governance. While robots.txt controls whether AI crawlers can access your site, LLMS.txt tells AI systems how to use your content — which sections to cite, what your site covers, and what to exclude. Both files are essential for AI-era SEO.
- Allow the four major AI search crawlers. GPTBot, ClaudeBot, Google-Extended, and PerplexityBot power the AI search products that drive growing referral traffic. Blocking them makes you invisible to 15–25% of potential informational search traffic. Block Bytespider; consider blocking CCBot.
- 96.8% of websites have no LLMS.txt file. Creating one now puts you ahead of nearly the entire web. It takes five minutes, costs nothing, and has no downside risk. The first-mover advantage for LLMS.txt adoption is enormous.
- robots.txt and LLMS.txt must be consistent. Do not allow content in LLMS.txt that robots.txt blocks. Do not list paths in LLMS.txt that do not exist. Audit both files together to ensure they align with your content strategy.
- Audit your configuration regularly. AI crawler policies change, new crawlers emerge, and your site evolves. Run the 10-check audit from this guide monthly, or use seoscore.tools to automate the process as part of your regular GEO audit.
- WordPress users: automate with SEO Autopilot. The SEO Autopilot plugin generates and maintains LLMS.txt automatically, audits robots.txt AI directives, and includes AI crawler configuration as part of its 207-check SEO, AEO, and GEO audit suite.