Crawlers/Robots in Your Log Files

As you most probably know, search engines use "crawlers" to check your website for your content. Since your Website will not be listed in the major search engines unless their crawlers come and visit you, it is of utmost importance that you check to make sure you have been visited before worrying that your website is not ranking.

There are various ways to check for crawlers, but the most reliable is to use your access_log file. This may have been renamed to something else on your server, and you may need to ask your hosting provider where this file is located. This access_log file is generally in real-time, meaning that each hit made to your Website it will be entered into your access_log immediately.

Once you find the access_log, there are several things you can do. If your website is busy, your access_log may be quite large and therefore difficult to work with using conventional methods. However, the easiest thing to do is download the file to your computer with an FTP (File Transfer Protocol) client, such as Direct FTP, and open it with a text editor, such as Notepad.

Each line of the access_log represents a hit to your website. A hit is not the same as a page view or visit. A hit is the single request for a single object on your server. If a webpage is simple HTML and contains five images, then six hits total will occur: one for the page itself and a subsequent hit for each image. Because of this, it may be hard on your eyes to scan this file manually.

So, how do we find when the crawlers visit your website? You need to search for the patterns that these crawlers produce. Here is a list of search engines and the names of their crawlers. Search for the names of the crawlers to find hits from the respective engine:

Google: Googlebot
Yahoo!: Slurp
AltaVista: Scooter
MSN (Microsoft): Msnbot

You can find out how to look for other crawlers by determining what their user agents are and searching for those, but these are the main ones.

So, for each occurrence of the above in your access_log, you have been visited by a crawler. You can read the rest of the line to identify various other properties relating to the particular hit. You should be able to see the page requested, the HTTP response code (should be 200), the time and date, and other bits of information where available.

Once you know a page has been crawled, it may take 30-60 days to get listed in search engines, although many engines are much faster than this.

The Team

The Latest News

Our Channels

Get Our Newsletter

Contact Us

Super Pack

Templates Pack

Emails Pack

Forms Pack

Sites

Site Designer

The HTML Editor

Forms

Web Form Builder

Form Designer

Emails

Email Designer

Freebies

One Click Web Hosting

Web Form Hosting

Plans & Pricing

Support

User Forums

User Guides

Site Designer College

Articles

Contact Us

Web Templates

Web Components

Software

Packs

Advanced Web Design

The HTML Editor

Free Stuff

Free HTML Editor

Crawlers/Robots in Your Log Files