Website Optimization: Understanding the Robots.txt File Impact on Website Optimization

It's funny how the success (or failure) of a website depends heavily on just one seemingly insignificant document – the robots.txt file. Seasoned marketers have long been using and optimizing it for their online marketing and search engine optimization efforts, but many newbie webmasters don't seem to fully realize its potential. Heck, some don't even know what the file is for!

The robots.txt File at a Glance
Back in the early 90s, when the World Wide Web was young, a set of standards with which to regulate the indexing practices of search engines was needed. These standards would become known as the Robots Exclusion Protocol or the robots.txt protocol.

Simply put, a robots.txt file provides instructions about how web and search engine crawlers (also known as "spiders") can act on your website. The file can prevent these crawlers from indexing some parts of a site (like sitemap files) on a search engine and prevent them from accessing certain pages.

How to create a robots.txt File
As the file name suggests, a robots.txt file is a simple text file, so there's no need to download any special sort of software to create one for your site. You can just open up Notepad or any other text editor to make one.

The contents of a robots.txt file consist of "records," which are commands for each individual web crawler that your website will encounter. A record is made up of two fields: the first field is called the agent line, which names the specific search engine robot that the instruction will be given to, and the second field is called the disallow line, which specifies the files that the robot is not allowed to access and index. There can be multiple disallow lines.

For example, if you want to prevent Google from indexing all files in the images directory of your website, then the record would look like this:

User-agent: googlebotDisallow: /images

Googlebot is the name of Google's webcrawler. In the next field, you're directly telling it to not index anything in the images directory. Should you leave the disallow field blank, this means that all files on your site can be indexed by a robot or web crawler. If you want to give the same instructions for all web crawlers, you can use an * (asterisk) in the user agent field, like so:

User-agent: *Disallow: /images

Why is the robots.txt File Important for Website Optimization?
The robots.txt file is one of the first things that webmasters tinker with when they want to crank up their search engine rankings, but why is this file so important for SEO?

Well, there are certain things that you don't want search engines to see on your site. Duplicate pages, for example, can result in decreased rankings on search engines, so you might want to tell crawlers to ignore any sort of duplicated pages on your site directory. You also wouldn't want certain pages to show up on SERPs (search engine results pages) unless a visitor does a certain action (like accessing a sales page by clicking on a "checkout" button). You can also tell robots not to crawl any private subdirectories or files that you might have on your site.

Still have questions about robots.txt files? Contact Website Optimization for help.

Website Optimization
3134 SUNNYWOOD DR
ANN ARBOR, MI 48103
Phone: (877) 748-3678
Fax: (734) 661-1331
http://www.websiteoptimization.com/
https://plus.google.com/108123505155721216764/

Website Optimization

Pages

Friday, November 21, 2014

Understanding the Robots.txt File Impact on Website Optimization

No comments:

Post a Comment