Project

General

Profile

Actions

Patch #3754

open

add some additional URL paths to robots.txt

Added by mark burdett over 14 years ago. Updated about 9 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
2009-08-18
Due date:
% Done:

0%

Estimated time:

Description

My apache logs show that some redmine URLs are being heavily indexed by robots, and it seems like it would be best to have them blocked by robots.txt:
/issues
/projects/*/time_entries
/projects/N/wiki/* (where N is the numeric project id)
/repositories/annotate/*
/repositories/browse/*
/repositories/changes/*
/repositories/diff/*
/repositories/entry/*


Files

robots.txt.patch (587 Bytes) robots.txt.patch mark burdett, 2009-09-23 11:18
robots.txt.patch (597 Bytes) robots.txt.patch mark burdett, 2009-10-01 01:38
robots.txt-2.patch (554 Bytes) robots.txt-2.patch Antoine Beaupré, 2013-03-11 22:30

Related issues

Related to Redmine - Defect #6734: robots.txt: disallow crawling issues list with a query stringClosedGo MAEDA2010-10-24

Actions
Actions #1

Updated by Mischa The Evil over 14 years ago

See the Bots Filter plugin which has some overlap (e.g. the repositories). Maybe you can modify it to adapt it to your precise requirements?

Regards,

Mischa.

Actions #2

Updated by mark burdett over 14 years ago

Or I can easily block these via my apache config. But I do think they should be added to robots.txt by default. I also wonder, how are Googlebot and others even finding some of these non-canonical paths? It could point to a bug elsewhere which is generating links to these paths?

Actions #3

Updated by mark burdett over 14 years ago

Here's a patch adding the additional problematic paths to the default robots.txt

Actions #4

Updated by Eric Davis over 14 years ago

I like having the robots crawl some of these pages, they even turn up when I'm searching for a bug that I've already fixed.

  • wiki pages
  • global issues list
  • repositories
Actions #5

Updated by mark burdett over 14 years ago

The wiki pages that this patch blocks are not the canonical path, they use the numeric project id rather than project name.

I now realize that the initial version of this patch blocked the individual issue pages; I intended to only block /issues? -- i.e. the global issue search page.

Actions #6

Updated by Jean-Philippe Lang over 14 years ago

  • Tracker changed from Defect to Patch
Actions #7

Updated by Brad Schick over 13 years ago

My site is also getting hammer on /repositories and /issues. Seems somewhat pointless to disallow access to these resources through /projects/... but not other urls.

Actions #8

Updated by Antoine Beaupré almost 11 years ago

This patch has been ready for more than 3 years, why hasn't this been committed yet?

Actions #9

Updated by Antoine Beaupré almost 11 years ago

Here's an updated patch for 1.4.

Actions #10

Updated by Antoine Beaupré about 9 years ago

Antoine Beaupré wrote:

Here's an updated patch for 1.4.

and that was now two years ago, with the patch sitting here for 5 years. can we at least get feedback on what's wrong with the patch, if anything?

thanks.

Actions

Also available in: Atom PDF