Passing the Word: Drupal robots.txt for SFSU Google Search

If you have an active Drupal site with lots of new content, the pager links at the bottom of each view (…?page=2) will eventually create a lot of extra URLs that will be added to your Search Engine Results Pages (SERPs).

The standard Google search website has no problem handling these extra URLs. However, if you want your site content to be indexed in a way that is usable by the SF State Google search, you will need to update the robots.txt for your site. SF State Google search has a limited indexing license, so not customizing your robots.txt may result in critical sections of your site not being indexed and non-user-friendly SERPs.

Besides the dynamically created pager links, more than a few non-critical directories and specific-use URLs were also being indexed. Using a combination of Google Webmaster Tools with traffic data from Google Analytics, I created a robots.txt that would restrict search engine spiders to the main sections of the site:

# SERP
Disallow: /?
Disallow: /*?
Disallow: /*.xml
Disallow: /*/tags/
Disallow: /archives/
Disallow: /category/
Disallow: /content/
Disallow: /filter/
Disallow: /files/
Disallow: /frontpage/
Disallow: /feed/fb/
Disallow: /feeds/
Disallow: /tag/
Disallow: /taxonomy/
Disallow: /user/

Passing the Word

February 17, 2011

Drupal robots.txt for SFSU Google Search

Disqus for Passing the Word