Patch #3754

add some additional URL paths to robots.txt

Added by mark burdett over 8 years ago. Updated almost 3 years ago.

Status:NewStart date:2009-08-18
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

My apache logs show that some redmine URLs are being heavily indexed by robots, and it seems like it would be best to have them blocked by robots.txt:
/issues
/projects/*/time_entries
/projects/N/wiki/* (where N is the numeric project id)
/repositories/annotate/*
/repositories/browse/*
/repositories/changes/*
/repositories/diff/*
/repositories/entry/*

robots.txt.patch Magnifier (587 Bytes) mark burdett, 2009-09-23 11:18

robots.txt.patch Magnifier (597 Bytes) mark burdett, 2009-10-01 01:38

robots.txt-2.patch Magnifier (554 Bytes) Antoine Beaupré, 2013-03-11 22:30


Related issues

Related to Redmine - Defect #6734: Robots index /issues (which isn't disallowed in robots.txt) New 2010-10-24

History

#1 Updated by Mischa The Evil over 8 years ago

See the Bots Filter plugin which has some overlap (e.g. the repositories). Maybe you can modify it to adapt it to your precise requirements?

Regards,

Mischa.

#2 Updated by mark burdett over 8 years ago

Or I can easily block these via my apache config. But I do think they should be added to robots.txt by default. I also wonder, how are Googlebot and others even finding some of these non-canonical paths? It could point to a bug elsewhere which is generating links to these paths?

#3 Updated by mark burdett about 8 years ago

Here's a patch adding the additional problematic paths to the default robots.txt

#4 Updated by Eric Davis about 8 years ago

I like having the robots crawl some of these pages, they even turn up when I'm searching for a bug that I've already fixed.

  • wiki pages
  • global issues list
  • repositories

#5 Updated by mark burdett about 8 years ago

The wiki pages that this patch blocks are not the canonical path, they use the numeric project id rather than project name.

I now realize that the initial version of this patch blocked the individual issue pages; I intended to only block /issues? -- i.e. the global issue search page.

#6 Updated by Jean-Philippe Lang about 8 years ago

  • Tracker changed from Defect to Patch

#7 Updated by Brad Schick over 7 years ago

My site is also getting hammer on /repositories and /issues. Seems somewhat pointless to disallow access to these resources through /projects/... but not other urls.

#8 Updated by Antoine Beaupré almost 5 years ago

This patch has been ready for more than 3 years, why hasn't this been committed yet?

#9 Updated by Antoine Beaupré almost 5 years ago

Here's an updated patch for 1.4.

#10 Updated by Antoine Beaupré almost 3 years ago

Antoine Beaupré wrote:

Here's an updated patch for 1.4.

and that was now two years ago, with the patch sitting here for 5 years. can we at least get feedback on what's wrong with the patch, if anything?

thanks.

Also available in: Atom PDF