The perfect robots.txt

Created by: laetzer, Last modification: 30 Apr 2010 (13:08 UTC) by paul greavy
A robots.txt is only needed if you want to exclude a search engine or other service from some directory on your service. Bitweaver doesn't create a robots.txt file by default. If you need one, you have to create it. This file can only be in the root directory.

If you cannot create such a file, use meta tags. There are pages online that discuss its value and its usage, or let you check for validity.

Development

This is an example of a robots.txt during testing or developing a site. Add services like browsershots.org, validators, link checkers etc. Pages that list user agent strings online resume writer tell you the possible values for the "user-agent:" part. In the following example, all user agents except one will be excluded, so no search engine will index your content before launch:

/robots.txt


User-agent: Browsershots
Disallow:

User-agent: *
Disallow: /


Live sites

When your site is live, you might want to exclude some services, so that nobody can submit your site there. (Doesn't make too much sense with browsershots.org, it's just an example. Also, anyone could just write their own service and crawl your site anyway.) The important part is where you exclude all (*) user agents from certain Bitweaver-specific directories:

/robots.txt


User-agent: Browsershots
Disallow: /

User-agent: *
Disallow: /temp/templates_c/
Disallow: /fckeditor/
Disallow: /gatekeeper/
Disallow: /install/
Disallow: /renamed_install_after_install_was_done_for_security/
Disallow: /kernel/
Disallow: /languages/
Disallow: /pdf/
Disallow: /quota/
Disallow: /sample/
Disallow: /shoutbox/
Disallow: /stars/
Disallow: /stats/
Disallow: /stencils/
Disallow: /stickies/


See also: The perfect .htaccess file