What is a Robots.TXT File

What is a robots.txt File?

Sometimes we need to let search engine robots know that certain information should not be retrieved and stored by them. One of the most common methods for defining which information is to be "excluded" is by using the "Robot Exclusion Protocol." Most of the search engines conform to using this protocol. Furthermore, it is possible to send these instructions to specific engines, while allowing alternate engines to crawl the same elements.

Should you have material which you feel should not appear in search engines (such as .cgi files or images), you can instruct spiders to stay clear of such files by deploying a "robots.txt" file, which must be located in your "root directory" and be of the correct syntax. Robots are said to "exclude" files defined in this file.

Using this protocol on your website is very easy and only calls for the creation of a single file which is called "robots.txt". This file is a simple text formatted file and it should be located in the root directory of your website.

So, how do we define what files should not be crawled by search engines? We use the "Disallow" statement!

Create a plain text file in a text editor e.g. Notepad / WordPad and save this file in your "home / root directory" with the filename "robots.txt".

The URL for your robots.txt file should be:
http://www.yoursite.com/robots.txt

This file will now become your index of files that may not be crawled by spiders. Let's say for example you have a file called "filename.html" on your website which you'd rather did not appear in search engines. You may instruct search engines to stay away from this file by adding the following line to your "robots.txt" file.

User-agent: * Disallow: /filename.html

Now, let's say you have 2 files which you wish to exclude, "filename1.html" and "filename2.jpg". You can use the following:

User-agent: * Disallow: /filename1.html Disallow: /filename2.jpg

Furthermore, you can choose to block entire directories by appending a "trailing slash" to the folder name. The following line will tell ALL robots to exclude ALL files located in the "directoryname" folder, which simultaneously excluding the aforementioned files:

User-agent: * Disallow: /filename.html Disallow: /filename1.html Disallow: /filename2.jpg Disallow: /directoryname/

Instructing Specific Engines
Should you wish to instruct only specific engines to exclude certain files, you can do so by specifying the "User Agent" of the robot in question. The "User Agent" value will vary by spider / robot. Examples are "Googlebot", which is the User Agent used by Google, and "Slurp", which is the identifying User Agent of Inktomi. Here is an example which will force Google ONLY to exclude all aforementioned files and directories, while instructing Inktomi to exclude 2 separate files names "slurp.html" and "imac.jpg":

User-agent: Googlebot Disallow: /filename.html Disallow: /filename1.html Disallow: /filename2.jpg Disallow: /directoryname/

User-agent: Slurp Disallow: /slurp.html Disallow: /imac.jpg

Important note
There are several important issues concerning the use of the "Robots Exclusion Protocol". Firstly, the "robots.txt" filename is case sensitive, and must not contain uppercase letters. All filenames, User Agents and directory names are case sensitive. This exclusion protocol does not apply to all robots & spiders. Lastly, You should be made aware that your robots.txt file will be visible to everybody and therefore no sensitive information should be specified in it. You should also be made aware that while legitimate robots generally do adhere to the Robot Exclusion Protocol, there is technically nothing to prevent them from looking at the files listed. Some hackers actually look at this file to see if there are any links to administration areas or database files, so do not list anything sensitive here unless it is also password protected through other means.

Robots.TXT Creators

http://www.yellowpipe.com/yis/tools/robots.txt/ http://tools.seobook.com/robots-txt/generator/ http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/ http://www.1pagedesign.com/robots.txt_generator/

The Team

The Latest News

Our Channels

Get Our Newsletter

Contact Us

Super Pack

Templates Pack

Emails Pack

Forms Pack

Sites

Site Designer

The HTML Editor

Forms

Web Form Builder

Form Designer

Emails

Email Designer

Freebies

One Click Web Hosting

Web Form Hosting

Plans & Pricing

Support

User Forums

User Guides

Site Designer College

Articles

Contact Us

Web Templates

Web Components

Software

Packs

Advanced Web Design

The HTML Editor

Free Stuff

Free HTML Editor

What is a Robots.TXT File

What is a robots.txt File?

Robots.TXT Creators