It seems almost eery that within the week I scheduled to write a post on WordPress SEO about the Robots.txt file, I get two…not one, but TWO calls from Realtors who just can not figure out why their sites are not ranking as they should. After careful observation, BOTH of them turn out to be robots command issues… Now granted one was an on page robots (no index) issue, but the other was a robots.txt file issue… Each carry the same message… HEY GOOGLE, DONT INDEX THIS SITE (PAGE) K?
Probably not the message YOU want to send to Google, now is it…
So today, we will speak about the robots.txt file… in a week or so, (probably not next week, as two robots posts in two weeks would even bury my eyelids…) we will talk about robots commands, another sure fire way to increase slightly, or DECREASE GREATLY your SEO.
So, What Is Robots.txt?
Robots.txt is a text (not html) file you add to you root directory that tells the Search Engine Spiders which of your pages NOT to visit. Search Engines do not guarantee they will abide by this information, but typically they do. Most spiders will not go where they are told not to go. A robots.txt is a strong suggestion to a search engine not to index your page. However, it is not a rule… It is a bit like asking someone not to go in a room while you are not home… the room is unlocked, but you trust they will not.
Just as you would not leave VALUABLES in that room, do not PROTECT sensitive information by adding it to your robots.txt file. There are way better ways to secure pages.
The location of robots.txt is very important. Search engines will not search your site for it. They simply scan the main directory and if they don’t find it, they assume that your site does not have a robots.txt file and they proceed to index everything they find.
So HOW do you write a Robots,txt file.
Well the simplest way, is to open NOTEPAD and create the file there. robots.txt is exactly what it says, it is a text file named robots not an HTML file. So, notepad is the logical choice.
The structure of a robots.txt is pretty simple, but frankly NOT very flexible– it is an endless list of user agents and disallowed files and directories. Basically, the syntax is as follows:
“User-agent” are search engines’ crawlers such as Google Bot or Yahoo and disallow: is a lists of the pages or directories tyou wish NOT to be indexed. You may also see lines of text proceeded by #. These are simple comments and intended for human eyes as opposed to the spiders.
For example, if I did not want the spiders to index my wp-admin page (something I do on all my sites to keep from leaking link-juice to meaningless pages) I would write it this way…
# All user agents are disallowed to see the /wp-admin page
The first line is for me, the second states “OK, this is for all spiders and bots” and the third line, “Pretty please, don’t index my logon page.”
The concept and structure of robots.txt is a decade old, and there are plenty of resources, such as http://www.robotstxt.org/ or you can go straight to the Standard for Robot Exclusion
Robots.txt files can get lengthy and the exact syntax, punctuation and SPELLING (most of the errors I see are mis-spelling of user agents, directories or missing colons, slashes, etc.) is of great importance. So be careful…
Here is an example of Virtual Results robots.txt file.
Dont be totally discouraged if you dont get all of this, as you will have another chance. In coming weeks, look for my WordPress SEO for Real Estate Websites post on wordpress DUPLICATION, a pretty large SEO issue with WordPress out of the box. WIthin this post, we will give tips and tools to use wordpress plugins to create robots.txt to reduce content duplication…
Next time? Creating Great WordPress titles for SEO