Free robots.txt Generator — AI Bot Presets, URL Tester & llms.txt
Generate a complete, production-ready robots.txt in seconds. Unlike every other generator built before 2025, this tool was designed for the era of AI crawlers: toggle GPTBot, ClaudeBot, Google-Extended, and a dozen other AI bots individually, apply site-type templates, test any URL against your rules, and generate your llms.txt — all without leaving the page.
The robots.txt file is placed at the root of your website (https://example.com/robots.txt) and instructs crawlers which pages they can and cannot access. Getting it right matters more than ever now that AI training crawlers regularly index site content without the traffic benefits of a search engine.
How to Use the robots.txt Generator
Generating a valid robots.txt takes under a minute:
- Choose a site template — Click Blog, E-commerce, SaaS, Enterprise, or News to pre-fill the most common allow/disallow rules for your site type. This gives you an instant baseline to customize from.
- Set AI bot permissions — In the AI Agent Controls section, toggle each crawler individually. Checked = allow, unchecked = block (injects
Disallow: /for that User-agent). Click "Block All" to block every AI crawler with one click. - Add or remove paths — In the Configuration tab, add specific paths to allow or disallow for
User-agent: *. For example, disallow/admin/and/checkout/while allowing everything else with/. - Test a URL — Use the URL Tester on the right to paste any path (e.g.
/admin/config/private) and select a User-agent to instantly see whether that URL would be allowed or blocked, and which specific rule applies. - Copy or download — The Raw Output Preview at the bottom shows your live robots.txt with syntax highlighting. Copy to clipboard or download as
robots.txt.
robots.txt Examples
| Input (User-agent / Path) | Rule | Result |
|---|---|---|
Googlebot / /admin/ |
Disallow: /admin/ |
❌ Blocked |
* / / |
Allow: / |
✅ Allowed |
GPTBot / /blog/post-1 |
Disallow: / (block all) |
❌ Blocked |
* / /?s=query |
Disallow: /?s= |
❌ Blocked |
Bingbot / /about |
No matching rule | ✅ Allowed by default |
Edge case — wildcard patterns:
Input: Disallow: /*?sort=
URL: /products?sort=price&order=asc
Result: ❌ Blocked (wildcard matches any prefix before ?sort=)
Edge case — empty Disallow:
Input: Disallow:
URL: /anything
Result: ✅ Allowed (empty Disallow means allow everything)
What Is robots.txt — And What It Doesn't Do
The robots.txt file is a plain text file based on the Robots Exclusion Protocol (REP), first introduced in 1994. It communicates crawling preferences to web robots — it does not enforce them. Compliant crawlers respect the rules; non-compliant ones don't.
Three things robots.txt does not do: it doesn't prevent humans from accessing a page, it doesn't guarantee a page won't be indexed (a page can be blocked from crawling but still appear indexed if it has inbound links — use noindex for that), and it doesn't protect sensitive data (use authentication for that).
Blocking AI Crawlers — The 2025 Guide
The rise of AI language models created a new category of web crawlers that extract content for training data rather than search results. Unlike Googlebot, which drives traffic to your site, AI training crawlers extract content without direct benefits.
Training crawlers (GPTBot, CCBot, anthropic-ai, Google-Extended, Bytespider, Applebot-Extended) download content to train future language models. Browsing/inference crawlers (OAI-SearchBot, PerplexityBot, ClaudeBot) power real-time AI search results and can drive referral traffic to your site.
Many publishers block training crawlers to protect their content while allowing browsing crawlers to maintain visibility in AI-powered search. Major AI companies — OpenAI, Anthropic, Google, Perplexity — have publicly committed to honoring robots.txt. Compliance is voluntary but the major players do follow it.
Common Use Cases
- Blocking faceted navigation: E-commerce sites generate thousands of near-duplicate URLs from filters (
?color=red&size=M). Disallowing/*?prevents crawlers from wasting crawl budget on these parameter-based URLs. - Protecting admin panels:
Disallow: /admin/andDisallow: /wp-login.phpensure admin pages aren't crawled, even though authentication should be the real security layer. - Sitemap declaration: Including
Sitemap: https://example.com/sitemap.xmlin robots.txt helps all crawlers discover your sitemap automatically, even if it hasn't been submitted to Search Console. - AI content protection: Publishers of news, research, or creative content increasingly block training crawlers to prevent their work from being used in AI model training without compensation.
- Crawl budget optimization: Large sites (10K+ pages) benefit from blocking low-value content (login pages, cart pages, search result pages) so crawlers spend budget on important content.
Common Mistakes with robots.txt
- Disallowing pages you want indexed:
Disallow: /blog/blocks Googlebot from crawling your blog entirely. These pages won't be discovered or indexed. Only block what you genuinely don't want crawled. - Confusing crawling with indexing: A blocked URL can still appear in search results if it has inbound links. To prevent indexing, use
<meta name="robots" content="noindex">on the page, not robots.txt. - Trailing slash matters:
Disallow: /adminblocks only the exact path/admin.Disallow: /admin/blocks/admin/and all subdirectories. Forgetting the trailing slash leaves subdirectories exposed. - Blocking CSS and JavaScript: Google uses rendered HTML (including CSS/JS) to evaluate page quality. Blocking your CSS or JS files with robots.txt can harm rankings because Google can't fully render the page.
Frequently Asked Questions
How do I block AI bots like GPTBot with robots.txt?
Add a separate User-agent block for each AI crawler with Disallow: /. For example: User-agent: GPTBot followed by Disallow: /. Repeat for each bot: OAI-SearchBot, Google-Extended, anthropic-ai, ClaudeBot, CCBot, PerplexityBot, Bytespider, Applebot-Extended, and Amazonbot. Use the "Block All" toggle in this generator to add all of these with one click.
Does blocking GPTBot stop ChatGPT from using my content?
Blocking GPTBot (OpenAI's training crawler) may reduce your content appearing in future ChatGPT training data. However, blocking OAI-SearchBot is a separate decision — that's the crawler that powers ChatGPT's real-time browsing mode and can drive referral traffic. Major AI companies have stated they respect robots.txt, but compliance is voluntary and historical content already indexed before you added the block is not removed.
What is the difference between robots.txt and meta robots?
robots.txt controls crawling — whether a bot accesses the URL at all. Meta robots tags (<meta name="robots" content="noindex">) control indexing — whether the content appears in search results. A URL can be disallowed in robots.txt but still appear indexed if search engines already know about it from links elsewhere. For guaranteed non-indexing, use noindex on the page. For crawl budget management, use robots.txt.
What is llms.txt?
llms.txt is a community-proposed standard (2025) for communicating with AI language models about your website's structure and content. While robots.txt tells crawlers what to avoid, llms.txt is a curated guide — it lists your most important pages, describes what the site is about, and signals how AI should reference your content. It's placed at example.com/llms.txt. Not an official W3C standard yet, but adoption is growing among AI systems and site owners.
Does robots.txt affect SEO?
Indirectly, yes. Blocking unnecessary pages — admin panels, duplicate content from URL parameters, faceted navigation, login pages, and search result pages — helps search engines focus their crawl budget on your important content. This can lead to faster indexing of new pages on large sites. Never disallow pages you want indexed; use noindex for that. Blocking CSS and JavaScript files can hurt rankings since Google needs them to fully render and evaluate your pages.
Resources
- Google — robots.txt Specification — Google's official documentation on robots.txt syntax, including wildcard support and how Googlebot interprets rules.
- OpenAI — GPTBot Documentation — Official OpenAI documentation on GPTBot, including how to opt out of crawling for model training.
- llms.txt Standard Proposal — The community proposal for the llms.txt standard, including format specification and examples.