Wednesday, August 15, 2007 at 4:01 PM
Imagine that you're responsible for the domain www.example.com and you want search engines to index everything on your site, except for your /images folder. You also want to make sure your Sitemap gets noticed, so you save the following as your robots.txt file:
You visit Webmaster Central to test your site against the robots.txt analysis tool using these two test URLs:
Earlier versions of the tool would have reported this:
The improved version tells you more about that robots.txt file:
We also want to make sure you've heard about the new unavailable_after meta tag announced by Dan Crow on the Official Google Blog a few weeks ago. This allows for a more dynamic relationship between your site and Googlebot. Just think, with www.example.com, any time you have a temporarily available news story or limited offer sale or promotion page, you can specify the exact date and time you want specific pages to stop being crawled and indexed.
Let's assume you're running a promotion that expires at the end of 2007. In the headers of page www.example.com/2007promotion.html, you would use the following:
CONTENT="unavailable_after: 31-Dec-2007 23:59:59 EST">
The second exciting news: the new X-Robots-Tag directive, which adds Robots Exclusion Protocol (REP) META tag support for non-HTML pages! Finally, you can have the same control over your videos, spreadsheets, and other indexed file types. Using the example above, let's say your promotion page is in PDF format. For www.example.com/2007promotion.pdf, you would use the following:
X-Robots-Tag: unavailable_after: 31 Dec
2007 23:59:59 EST
Remember, REP meta tags can be useful for implementing noarchive, nosnippet, and now unavailable_after tags for page-level instruction, as opposed to robots.txt, which is controlled at the domain root. We get requests from bloggers and webmasters for these features, so enjoy. If you have other suggestions, keep them coming. Any questions? Please ask them in the Webmaster Help Group.