Wednesday, November 23, 2005

Handling Robots

What exactly is a Robot?

A Robot in Search engine optimization services terminology is a search engine software program which visits a page on a website and follows all the links of the website from that page and indexes some or all of the pages in the website.

Why do we need robots?

Everyday search engines receive hundreds of new website submissions. It is quite cumbersome and time consuming for a human to review the whole of the website and judge whether that particular website meets the search engine optimization standards and index the same. Here is where our friend robot comes into picture. Robots are highly intelligent software programs which crawl the entire website, checking the relevancy, consistency and significance of the site thereby effectively indexing the website into the search engine database reducing the amount of time consumed per site. In this way a robot can quickly index more sites per day. Though a robot is not a very critical aspect of search engine optimization services technology, it is advisable to include it.

Controlling a Robot

Normally, a robot visits the home page of the site and follows the links present in the page, scanning each link and page. Sometimes we do not prefer a robot to index a particular page(s) in our site. For instance, you might want a series of pages to be viewed in sequence and would like to index only the page one. To achieve this, we have a special kind of Meta tag known as Robots Meta tag. The robots Meta tag is similar to other Meta tags and is placed in the head of the document. This tag dictates the robot which pages to be indexed, which pages not be indexed, which links should be followed and which links should not be followed.

A typical meta robots tag would resemble as follows - This blog is not allowing to post the meta robot tag.

The meta tag describes us that the robot should index the page visited but to not follow the links in the page.

The other most and significant part of controlling robots is the robots.txt file. The robots.txt file is used primarily to control areas or portions of the website by excluding those portions being visited by the robot. Whenever a robot visits a site, it first checks the robots.txt file.

The robots.txt file,

  • is a text file created in notepad or any text editor
  • should be placed in the top level directory or root of the website or server space
  • should include all lower case letters

Through the robots.txt file we can,

  • Exclude all robots from visiting the server
  • Allow complete access to all robots
  • Exclude robots from accessing a portion of the server
  • Exclude a specific robot
  • Exclude certain type of files from accessing by specifying the file extensions.

Finally, we can conclude that the robots.txt file basically acts as a filter thereby providing total control over the search engine robot.

While talking of robots I would like to mention the revisit tag. This tag is an important tag in the realm of search engine optimization services technology. The revisit tag tells the search engine the time duration after which it should visit your site again. If you change your site's contents frequently then your revisit time should be say a week else it can be higher. Your search engine rankings dip if the search engine visits you a second time and finds that the content has not been altered significantly. Though not all search engines honor the Revisit tag, it is advisable to include the tag. If you are keen on top search engine rankings use the revisit tag judiciously.

No comments: