Patch #3754
add some additional URL paths to robots.txt
| Status: | New | Start date: | 2009-08-18 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | - | % Done: | 0% | |
| Category: | - | |||
| Target version: | - |
Description
My apache logs show that some redmine URLs are being heavily indexed by robots, and it seems like it would be best to have them blocked by robots.txt:
/issues
/projects/*/time_entries
/projects/N/wiki/* (where N is the numeric project id)
/repositories/annotate/*
/repositories/browse/*
/repositories/changes/*
/repositories/diff/*
/repositories/entry/*
Related issues
History
#1 Updated by Mischa The Evil almost 4 years ago
See the Bots Filter plugin which has some overlap (e.g. the repositories). Maybe you can modify it to adapt it to your precise requirements?
Regards,
Mischa.
#2 Updated by mark burdett almost 4 years ago
Or I can easily block these via my apache config. But I do think they should be added to robots.txt by default. I also wonder, how are Googlebot and others even finding some of these non-canonical paths? It could point to a bug elsewhere which is generating links to these paths?
#3 Updated by mark burdett over 3 years ago
- File robots.txt.patch
added
Here's a patch adding the additional problematic paths to the default robots.txt
#4 Updated by Eric Davis over 3 years ago
I like having the robots crawl some of these pages, they even turn up when I'm searching for a bug that I've already fixed.
- wiki pages
- global issues list
- repositories
#5 Updated by mark burdett over 3 years ago
- File robots.txt.patch
added
The wiki pages that this patch blocks are not the canonical path, they use the numeric project id rather than project name.
I now realize that the initial version of this patch blocked the individual issue pages; I intended to only block /issues? -- i.e. the global issue search page.
#6 Updated by Jean-Philippe Lang over 3 years ago
- Tracker changed from Defect to Patch
#7 Updated by Brad Schick almost 3 years ago
My site is also getting hammer on /repositories and /issues. Seems somewhat pointless to disallow access to these resources through /projects/... but not other urls.
#8 Updated by Antoine Beaupré 3 months ago
This patch has been ready for more than 3 years, why hasn't this been committed yet?
#9 Updated by Antoine Beaupré 3 months ago
- File robots.txt-2.patch
added
Here's an updated patch for 1.4.