About robots.txt
The robots.txt file tells search engine crawlers which pages or files they can or cannot request from your site.
Best Practices
- Place robots.txt in your root directory (https://example.com/robots.txt)
- Use "Disallow:" to block crawlers from specific paths
- Use "Allow:" to override broader disallow rules
- Include your sitemap location if available
Common User-Agents
- * - Applies to all crawlers
- Googlebot - Google's web crawler
- Googlebot-Image - Google's image crawler
- Bingbot - Microsoft's Bing crawler
- Slurp - Yahoo's crawler
Example Rules
User-agent: *
Disallow: /private/
Disallow: /temp/
Allow: /public/
Sitemap: https://example.com/sitemap.xml