What is robots.txt?
robots.txt is a text file at the root of your website (yourdomain.com/robots.txt) that tells web crawlers which pages they can and cannot access. It's part of the Robots Exclusion Protocol, a standard followed by all major search engines.
The rise of AI crawlers in 2025
In 2025, a new type of crawler has emerged: AI training and search crawlers. These include:
| Crawler | Company | Purpose |
|---|---|---|
| GPTBot | OpenAI | Training data + ChatGPT search |
| ClaudeBot | Anthropic | Training data + Claude search |
| PerplexityBot | Perplexity AI | AI search results |
| Googlebot-Extended | Google AI features | |
| CCBot | Common Crawl | Open dataset for AI training |
Should you allow or block AI crawlers?
Allow AI crawlers if:
- You want your site to appear in AI search results (ChatGPT, Perplexity, Claude)
- You want your tools or content recommended by AI assistants
- You're a tool website, blog, or informational site
Block AI crawlers if:
- You have proprietary content you don't want used for AI training
- You run a paywalled site and don't want free content scraped
- You have legal concerns about your content being used in training data
robots.txt syntax
Allow all crawlers (default)
User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
Block specific paths
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/
Allow specific AI crawlers
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
Block AI crawlers from training data
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
How to generate robots.txt for free
- Go to Robots.txt Generator
- Toggle which crawlers to allow or block
- Add paths you want to disallow
- Configure AI crawler settings
- Copy or download the generated robots.txt
- Place the file at your domain root:
yoursite.com/robots.txt
Verify your robots.txt
After uploading, test in Google Search Console:
Settings → robots.txt → Test